Tomohiro KUBOTA wrote on 2002-04-02 14:29 UTC: > Strictly speaking, JIS X 0213:2000 *cannot* be defined as a mapping > table against ISO 10646, because JIS X 0213's han unification rule > is different from ISO 10646's one. (You know, Unicode added several > tens of compatibility ideographs which are "different characters" in > JIS X 0213's point of view and "different glyphs of the same character" > in Unicode's point of view.)
Again, that's just the old many-to-one issue here, nothing critical. The fact that Unicode contains both U+00B5 MICRO SIGN U+03BC GREEK SMALL LETTER MU (both of which are really the exact same character in most people's view) didn't prevent ISO 8859-1 being mapped to UCS by asigning its 0xB5 to U+00B5 MICRO SIGN in the round-trip compatibility table. More examples on http://www.cl.cam.ac.uk/~mgk25/unicode.html#conv When you convert from Unicode to another encoding and the Unicode character that you have to convert does not show up in the mapping table, then try to normalize both the character to be converted and the Unicode values in the mapping table to see whether you will then find a match. For Han, there are no normalization rules yet, but there is lots of similar equivalence information in the Unihan database. Can't that be used to come up with a satisfactory many-to-one mapping? The compatibility characters are there in Unicode to allow you to chose to either use the unification rules of JIS or the unification rules of the IRG, at your choice. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
