Hi Ken, Thanks for your input and below are my reponses:
> I have to disagree. I am certain that labelling what script an > IDN is in will just cause problems. > It will *never* cause any problems because the script labelling of an IDN by itself does not do anything. The worst case is that it is just part of the domain name that serves absolutely no purpose (like ".ca" serves no purpose). However, if and when people decide to make use of it, then it becomes a very powerful system. > At the very least, this will introduce an entire new class of > error conditions, where the label says one thing, but the > character content of the IDN does not in fact match the label. > With a .<traditional> label, if the IDN is entered as simplified Chinese, then either an error will appear to alert the user or the simplified Chinese will be converted automatically to traditional Chinese. An IDN that end with ".ca" abviously cannot provide such a benefit. > Furthermore, the example we have been talking about here, > traditional versus simplified Chinese, is not even a script > difference in the first place. "Traditional" versus "Simplified" > in a character set context, and as typically implemented, > refers to distinctions between Code Page 950 (Big 5) and > Code Page 936 (GBK, etc.), together with the fonts, input methods, > message resource files, and such, as needed > to support them. And either of those character sets is actually > mixed script, since they both support Latin characters from > ASCII, as well as the basic Greek alphabet and Bopomofo. > "Simplified Chinese" also supports the basic Cyrillic alphabet > and Hiragana and Katakana for Japanese. > Depite how certain input method/tables/encodings/files/application/etc may support others, simplified Chinese is simplified Chinese and traditional Chinese is traditional Chinese- there are no Greek, no Japanese, no English, and no funny symbols. > Even if you are just talking about Traditional versus > Simplified Chinese characters (ideographs) within the > Han script subparts of Code Page 950 or Code Page 936, the > distinction is not as clean as you might think it would be. > The PRC simplified set, even in its earlier forms in GB 2312, > contain *some* traditional forms for characters. But the > current extensions, first for GBK (~ Microsoft Code Page 936), > and now for GB 18030, incorporate *all* of the Han characters > from the Unicode 3.0 repertoire, which means that a > "Simplified" code page for China now contains *all* of the > traditional characters from Code Page 950, as well as all > the simplified characters from Unicode 3.0. > This is a good point. It serves to illustrate how the line between Traditional Chinese and Simplified Chinese is getting more blurred because they are used so often interchangeably- hence the need for TC<->SC conversion. However, if you are trying to say that there is no distinction between them because GBK/GB18030 has incorporated everything then you are absolutely incorrect. Traditional Chinese is traditional Chinese and simplifed Chinese is simplified Chinese. > And of course, Unicode data itself encompasses both simplified > and traditional forms of Chinese ideographs. So what would the > IDN distinction between simplified and traditional mean if > data was encoded in Unicode? > > Even the identification of scripts is non-trivial. Many > characters are *shared* between scripts, or are borrowed > from one script to the next. Cyrillic and Latin have a long > history of cross-borrowing forms from one script into the > other, for example, for special uses. And Japanese got all > its Chinese characters (kanji) in the first place by > borrowing them from Chinese. > Characters that share the same Unicode can definitely be labeled as different scripts- whether it is Chinese or Japanese or any other. (Much like "same.com" and "same.ca".) The benefit for this IDN distinction should be obvious as the same character (with the same unicode) may have completely different meaning in different scripts. One labeled as <same>.<traditional> (written in Traditional Chinese ofcourse) and the other labeled as <same>.<simplified> (written in Simplfied Chinese). Thanks Ben
