On Fri, 30 Nov 2001 17:54:37 -0500 John C Klensin <[EMAIL PROTECTED]> writes: > --On Friday, 30 November, 2001 13:21 -0600 liana Ye > <[EMAIL PROTECTED]> wrote: > > > Thanks John for you are addressing what I am looking > > for. And I do think bottom up discussion hitting the > > wall after the introducing TC/SC, because it is a part of the > > CJK problem that JET has been facing. It is a good time to > > look at top-down wise. And we will hit CJK problem > > too as codepoint usage conflicts. > > But there is very little top-down where the DNS itself is > concerned. Despite upward-facing uses, it is fundamentally > downward-facing, and that is, in different language, what the > "identifier" discussion is all about. > > > So I think CJK code points usage conflicts should be > > resolved before this group can discuss solutions effectively. > > I think that statement is equivalent to "the IDN WG should never > finish its work, or at least should take years to do so". With > the understanding that I don't read Han characters, I think > there have been ample illustrations over the last several months > that completely resolving "CJK code point(s) usage conflicts" > will require that the Japanese and Korean languages be reformed, > possibly to use a different character set base entirely. >
That is the reason for me to ask "how different" the three languages are in using these codepoints? Can we come up another way to look at these code points? Since they have been through visual classificantion by Unicode Consortium already, should we pick another measure to look at them? Characters have four distincted espects: visual, phonetics, composition and semantics. The visual aspects of a character has been scrutinized by Unicode Consortium already. The composition aspect of them is a subject for a long discussion good enough for a book. The common measure is along semantic similarity. But how precise can we classify them, synonyms like what is in a dictionary, used as keywords? This is the layer 3 approach, it is may be effective in free text, but as a structured domain names, different codepoints of a synonym set means different entities. They can not be mixed or to reduce a search space needed as IDN identifiers. The next option along semantic classificantion is TC/SC based codepoints, with demands from Chinese group. To say they are the font difference is not the whole picture, but I have been using it to STRESS they are "semantically equivalent" among Chinese, and serve a good purpose for internationization of domain names, and help to reduce trademark conflicts on the net. I am trying to divide the problem up into separeted parts, for examination of the problem. 1) Are these symbols semanticly equivalent among C,J,K? 2) If they are, we can do > > For example: > > One Latin case from [nameprep]: > > 0048; 0068; Case map > >... > > And Chinese TC/SC example: > > <wind> has four code points in Chinese: > > TC, SC, TC radical, SC radical. > >... 3) If they are not equivalent, we should record their language context and handle them in a similar way with Bengali, Tibetan ... I do not see what is so wrong with such a request to Unicode group. Afterall the JET has been gone over this many times, can I have a look at what are these codepoints with an Acrobat Reader? I don't feel like to get into more aspects of a character at this time, and raise more discussion points. But since I have started the four aspects of a character, I will mention the last aspect of a character is its phonetic attribute. If we can use it for character encoding, then we will have phonetic ACE at the end. Liana
