In a message dated 2001-10-09 23:12:59 Pacific Daylight Time, [EMAIL PROTECTED] writes:
>> To say that UTF-8 does not preserve case distinctions is complete >> nonsense. It is the nameprep stage that folds away case distinctions >> (for better or worse). > > If you mean casing of Latin characters , you may be > right. but you can try the characters in (u+F94D , u+6dda), (u+F950, u+7e37) > that are compatibility CJK ideograph characters in www.unicode.org . Neither the compatibility CJK ideographs nor any other CJK ideographs have "case" in the sense that bicameral alphabets like Latin, Greek, and Cyrillic have "case." U+F94D and U+6DDA are not upper-case or lower-case forms of one another. The great majority of the compatibility CJK ideographs, and specifically the two L.M. Tseng cited, exist solely for round-trip conversion from a legacy encoding. > If these characters are mapped to single one code point like the case mapping > in ASCII , you can not use UTF-8 to do case-like-insensitive comparation and > to keep case-prserving . The difference come from the relative to LDH-DNS . Nothing in Unicode or ISO/IEC 10646, and certainly nothing in UTF-8, maps the two characters in such a pair to a single code point. The standard cross-references the compatibility characters to the "real" characters, but in informative notes only. > I don not against UTF-8 , but AMC-ACE-Z can support > case-code- mapping and case-preserving-after-code-mapping and > case-sensitive-comparation all coexisted. It is an intergreted properties in > LDH-DNS. I thought nameprep was the mechanism that handled equivalencies like this. If AMC-ACE-Z *without nameprep* can equate U+F94D with U+6DDA, it is a lot more complicated and requires much larger tables than Adam is letting on. I suspect the real comparison here is between "UTF-8 without nameprep" and "ACE with nameprep." This is like preferring fresh apples over rotten oranges; it has nothing to do with the relative merits of apples and oranges. -Doug Ewell Fullerton, California
