Re: [idn] Chinese Domain Name Consortium (CDNC) Declaration

Kenneth Whistler Tue, 05 Feb 2002 20:42:44 -0800

L.M. Tseng wrote:

> From [EMAIL PROTECTED] Tue Feb  5 02:36:26 2002
> To: "Erin Chen" <[EMAIL PROTECTED]>, "Dave Crocker" <[EMAIL PROTECTED]>
> Cc: "IESG" <[EMAIL PROTECTED]>, "IAB" <[EMAIL PROTECTED]>,
>         "IETF IDN WG" <[EMAIL PROTECTED]>
> Subject: Re: [idn] Chinese Domain Name Consortium (CDNC) Declaration


> Dear Dave Crocker:
>                   My  friend give me an example about  CJK UNICODE ,  It is
> so ambiguous to me to deifferentiate which  one is a correct Chinese
> characters or  not ?  In  our  hand writting , each pair are used and mixed
> .
> 
> 淸眞敎 U+6DF8 U+771E U+654E
> 淸眞教 U+6DF8 U+771E U+6559
> 淸真敎 U+6DF8 U+771F U+654E
> 淸真教 U+6DF8 U+771F U+6559
> 清眞敎 U+6E05 U+771E U+654E
> 清眞教 U+6E05 U+771E U+6559
> 清真敎 U+6E05 U+771F U+654E
> 清真教 U+6E05 U+771F U+6559

Huh? How is this contributing to closure on Last Call on
the IDNA documents? And why is it cc'd to IESG and IAB?

For those who may be mystified, this is the Chinese word for
"Islam", qing1zhen1jiao4.

The ordinary way this would appear in a PRC dictionary is:

   U+6E05 U+771F U+6559

and not any of the other 7 permutations.

In a more traditional dictionary as might be seen in Taiwan
or Hong Kong, it might be printed:

   U+6DF8 U+771E U+6559

and not any of the other 7 permutations.

However, if you were using a Big-5 computer in Taiwan,
you would use the same characters as for the PRC for
this:

   U+6E05 U+771F U+6559

and not any of the other 7 permutations. (though the
fonts might vary in which glyph they show, in any case)

U+6E05 and U+771F, by the way, are examples of "traditional
simplifications" reflecting handwritten forms, that
predate the PRC systematic simplifications. The same two
forms are also used in Japan.

U+654E is another handwriting alternative for U+6559, but
it is seldom seen in printed material. U+654E is used in
the PRC, Taiwan, and in Japan alike.

All 6 characters have G, T, and K sources in 10646, and
4 of them have J sources as well. So for this kind of
overlap of forms, any suggestion to delete G-source-only
characters from the allowed set does nothing at all.

And lest this example be taken on its face value
as indicating a problem in "CJK UNICODE", it should be noted
that the presence of these alternate forms of the "same character"
in Unicode is due to the same distinctions being made in
legacy CJK character encodings in Asia. In particular,
note the following mappings:

For "GBK", Code Page 936 Simplified Chinese:

0x9C5B  0x6DF8  #CJK UNIFIED IDEOGRAPH
0xC7E5  0x6E05  #CJK UNIFIED IDEOGRAPH
0xB177  0x771E  #CJK UNIFIED IDEOGRAPH
0xD5E6  0x771F  #CJK UNIFIED IDEOGRAPH
0x949C  0x654E  #CJK UNIFIED IDEOGRAPH
0xBDCC  0x6559  #CJK UNIFIED IDEOGRAPH

And for "Shift-JIS", Code Page 932 Japanese:

0xEDE4  0x6DF8  #CJK UNIFIED IDEOGRAPH
0xFB43  0x6DF8  #CJK UNIFIED IDEOGRAPH
0x90B4  0x6E05  #CJK UNIFIED IDEOGRAPH
0xE1C1  0x771E  #CJK UNIFIED IDEOGRAPH
0x905E  0x771F  #CJK UNIFIED IDEOGRAPH
0xEDB1  0x654E  #CJK UNIFIED IDEOGRAPH
0xFACD  0x654E  #CJK UNIFIED IDEOGRAPH
0x8BB3  0x6559  #CJK UNIFIED IDEOGRAPH

So if you are working on a Windows system in either of
these legacy code pages, in China or Japan, you
already have the same options for representational
ambiguity, without invoking Unicode at all.

--Ken

> 
> L.M.Tseng
>

Re: [idn] Chinese Domain Name Consortium (CDNC) Declaration

Reply via email to