L.M. Tseng wrote: > From [EMAIL PROTECTED] Tue Feb 5 02:36:26 2002 > To: "Erin Chen" <[EMAIL PROTECTED]>, "Dave Crocker" <[EMAIL PROTECTED]> > Cc: "IESG" <[EMAIL PROTECTED]>, "IAB" <[EMAIL PROTECTED]>, > "IETF IDN WG" <[EMAIL PROTECTED]> > Subject: Re: [idn] Chinese Domain Name Consortium (CDNC) Declaration
> Dear Dave Crocker: > My friend give me an example about CJK UNICODE , It is > so ambiguous to me to deifferentiate which one is a correct Chinese > characters or not ? In our hand writting , each pair are used and mixed > . > > 淸眞敎 U+6DF8 U+771E U+654E > 淸眞教 U+6DF8 U+771E U+6559 > 淸真敎 U+6DF8 U+771F U+654E > 淸真教 U+6DF8 U+771F U+6559 > 清眞敎 U+6E05 U+771E U+654E > 清眞教 U+6E05 U+771E U+6559 > 清真敎 U+6E05 U+771F U+654E > 清真教 U+6E05 U+771F U+6559 Huh? How is this contributing to closure on Last Call on the IDNA documents? And why is it cc'd to IESG and IAB? For those who may be mystified, this is the Chinese word for "Islam", qing1zhen1jiao4. The ordinary way this would appear in a PRC dictionary is: U+6E05 U+771F U+6559 and not any of the other 7 permutations. In a more traditional dictionary as might be seen in Taiwan or Hong Kong, it might be printed: U+6DF8 U+771E U+6559 and not any of the other 7 permutations. However, if you were using a Big-5 computer in Taiwan, you would use the same characters as for the PRC for this: U+6E05 U+771F U+6559 and not any of the other 7 permutations. (though the fonts might vary in which glyph they show, in any case) U+6E05 and U+771F, by the way, are examples of "traditional simplifications" reflecting handwritten forms, that predate the PRC systematic simplifications. The same two forms are also used in Japan. U+654E is another handwriting alternative for U+6559, but it is seldom seen in printed material. U+654E is used in the PRC, Taiwan, and in Japan alike. All 6 characters have G, T, and K sources in 10646, and 4 of them have J sources as well. So for this kind of overlap of forms, any suggestion to delete G-source-only characters from the allowed set does nothing at all. And lest this example be taken on its face value as indicating a problem in "CJK UNICODE", it should be noted that the presence of these alternate forms of the "same character" in Unicode is due to the same distinctions being made in legacy CJK character encodings in Asia. In particular, note the following mappings: For "GBK", Code Page 936 Simplified Chinese: 0x9C5B 0x6DF8 #CJK UNIFIED IDEOGRAPH 0xC7E5 0x6E05 #CJK UNIFIED IDEOGRAPH 0xB177 0x771E #CJK UNIFIED IDEOGRAPH 0xD5E6 0x771F #CJK UNIFIED IDEOGRAPH 0x949C 0x654E #CJK UNIFIED IDEOGRAPH 0xBDCC 0x6559 #CJK UNIFIED IDEOGRAPH And for "Shift-JIS", Code Page 932 Japanese: 0xEDE4 0x6DF8 #CJK UNIFIED IDEOGRAPH 0xFB43 0x6DF8 #CJK UNIFIED IDEOGRAPH 0x90B4 0x6E05 #CJK UNIFIED IDEOGRAPH 0xE1C1 0x771E #CJK UNIFIED IDEOGRAPH 0x905E 0x771F #CJK UNIFIED IDEOGRAPH 0xEDB1 0x654E #CJK UNIFIED IDEOGRAPH 0xFACD 0x654E #CJK UNIFIED IDEOGRAPH 0x8BB3 0x6559 #CJK UNIFIED IDEOGRAPH So if you are working on a Windows system in either of these legacy code pages, in China or Japan, you already have the same options for representational ambiguity, without invoking Unicode at all. --Ken > > L.M.Tseng >
