> Actually it is funny that you call it Unicode. UTF-8 clearly comes from > the 10646 side of UCS, Unicode did not invent it at all, Unicode > was from the beginning set out to be a 16-bit code *only*.
That may be, but that is history now. Most of 10646 is the result of the merger between Unicode and 10646 in the early 1990-ies. > Then Unicode was > reluctantly persuaded to do 31-bit and later they were persuaded This is a tainted description. However, WG2 (10646) was "persuaded" to include UTF-16, so it goes both ways. > also to use the UTF-8. Very recently Unicode introduced UTF-32, > which is refelcting what has been using all the time. > The way 10646 is coming to Linux is also much > with the support from the ISO 14651 sorting standard 14651 is good, but has some flaws. See Unicode standard annex 10 (http://www.unicode.org/reports/tr10/) that avoids SOME (not all) of those flaws. > and the ISO TR 14652 locale standard. 14652 is NOT a standard. It is also very unlikely to ever develop into one. Keld, please stop promoting it as a standard, when you very well know that it is NOT a standard. > I think the proper way to characterize what we do now in Linux is > to say ISO 10646, and probably mention Unicode in parenthesis the first > time it appears. It should not be that difficult, we have been > referring ISO 8859 for a long time. So pleas use ISO 10646 > in stead of the name Unicode when you refer to this in articles etc. No, why? Unicode and 10646 are kept in synch regarding characters names and to which positions they are allocated, as well as a few other things. Unicode, however, provides a lot of data, algorithms, and hints that are not provided (adequately) by any ISO standard. It therefore makes sense to refer to Unicode, and to use the Unicode character database data, http://www.unicode.org/ucd/, mapping tables (http://www.unicode.org/Public/MAPPINGS/) as well as algorithms specified by Unicode (such as the normalisation algorithm, http://www.unicode.org/reports/tr15/ and the BiDi algorithm, http://www.unicode.org/reports/tr9/). Kind regards /kent k > Kind regards > keld -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
