Soobok Lee stated: > Unicode is not designed for identifiers use but rather for > display or printing devices, from the beginning.
This is manifestly untrue. Unicode 1.0, published in 1991 specifically talks about "comparing text in operations such as determining sort order of two strings, or filtering or matching strings." And the earliest implementations of Unicode, which were in development as early as 1990, were already making use of Unicode strings as identifiers and object labels. I know, because I was directly involved in one such implementation. It is true, however, that the Unicode Standard itself didn't get around to making recommendations about how to deal with Unicode identifiers until Unicode 2.0 in 1996. So I can see how people might be confused about the design intent. > But, > Unicode is ever evolving to expand its application areass. The way I would put that is that more and more application areas are coming to grips with the implications of working with the Universal Character Set. > It's astonishing Unicode has not yet any concrete lists of > TC/SC 1:1 and 1:n equivalences. Partial lists have been available since Unicode 2.0, with the first publication of Unihan.txt. And with each major or minor version of Unicode, tremendous effort has been expended for further refining and adding to the immense amount of information provided in Unihan.txt about all the Han characters, their sources, and variants. Unicode 3.2 will see another significant step forward in the refinement of that information. But the only thing "astonishing" here is that you find it astonishing that no simple, complete TC/SC listing has yet been compiled. The Unicoders in this discussion have been asserting that the issue of "simplified" and "traditional" variants in Han is enormously complex, and is not amenable to simple, uncontextualized lists. Why then are you astonished that we have not produced a simple, uncontextualized list? You want a published list? Go buy Sanseido's Unicode Kanji Information Dictionary. (ISBN 4-385-13690-4) Then look at all the crossreferences (of distinct types) and the annotations of simplified forms (kantaiji) and variant forms (itaiji). Then tell me how long you think it would take to digest that down to a consensus list that would be acceptable for IDN use and which would provide user-tested "acceptability" for both Chinese and Japanese users. > I admit UNicode is the best solution as for now , but not the sufficienly > mature solution enough to serve the global language communities, > especially chinese. This is not an opinion apparently shared by the Chinese government, which has recently mandated the use and implementation of their latest national standard, GB 18030-2000, which contains the exact same repertoire of Han characters as in Unicode 3.0 (and ISO/IEC 10646-1:2000), with all the same warts and variant idiosyncrasies that you you claim renders Unicode an insufficiently mature solution to serve the Chinese language community. Who's right here? > IDN deployment is not a reversible process. That much I agree with. ;-) --Ken
