On Wed, Jan 09, 2002 at 04:57:29PM +0900, Tomohiro KUBOTA wrote: > Saying about round-trip compatibility, yes, round-trip compatibility > for EUC-JP, EUC-KR, Big5, GB2312, GBK are guaranteed, i.e., Unicode > is a superset of these encodings (character sets). However, > (1) there are no authorative mapping tables between these encodings > and Unicode and there are various private mapping tables. This > can cause portability problem around round-trap compatibility.
How major a problem is this in practice, right now? One temporary solution I could suggest is having specs (in this case, Ogg tags) choose a specific vendor's translation tables for these, and saying "until Unicode standardizes these tables, use these, not your system's." That would at least (try to) guarantee that until that happens, if a user enters text on one system in SJIS, and moves it to another via UTF-8, he'll get the same SJIS output. The obvious problem is that these tables will inevitably stick around a litle while after the tables are standardized, even if the system vendor is quick and puts out an update in a week. I think, however, that some people just aren't going to update their system (and so will use the obsolete vendor tables anyway), and the same people that wouldn't update their system wouldn't update their editors. When (hopefully not "if") the standardization happens, some users that are locally using these other encodings (and only transparently using UTF-8 in the file) will want the file updated, so the JIS (etc) they're seeing was the same as it was before. That becomes easier (add an "upgrade transcoding" option or similar, for the encoding that's being used). It couldn't be done automatically (unless the fact that the temporary translation table was in use for the tags was set, and then removed and deprecated when the standard tables become used.) It would mean editors would have to have their own transcoder for these encodings until this happens. That could be provided. I assume only one such table for any given language would be needed. Presumably JIS<->EUC-JP is well-standardized, so if an interim Unicode<->JIS is given, Unicode->JIS->EUC-JP could be used to get that, for example. What other encodings could be avoided like this? (I don't know anything about Chinese or Korean encodings.) Does anyone have any reasons why this would be a really bad idea? If not, does anyone have any suggestions of tables to use for different encodings? It'd be nice to use ones that are likely to be as close as possible to whatever becomes the eventual standard, but that might be an impossible goal. What other encodings (besides C, J and K ones) would need this? -- Glenn Maynard -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
