On Fri, Jun 13, 2003 at 03:18:59PM +0100, Markus Kuhn wrote: > > [Fortunately though, UTF-16 remains of little bother to anyone in the > Unix/Plan9 world, where UTF-16 and it's 0x10ffff limit are virtually > unheard of, except for the occasional shaking of heads, and very likely > will remain so. The reduction from 0xffffffff to 0x7fffffff was > technically reasonable though, as it permits the use of signed 32-bit > integer types. UTF-16 remains an ugly misscarriage, because by placing > the surrogates not at the end of the 16-bit space but into the middle of > the code range, it leads to an incompatible binary sorting order in > B-trees with UCS-4 and UTF-8 and therefore is useless for database > applications that want to hide the internal encoding from the user of > B-tree iterators.]
Hmm, UTF-8 is now being redefined as only being 21-bit. The UTF-8 definition of the internet is being redefined to refer to the Unicode specification. I believe tho, that UTF-8 as it is defined today in ISO/IEC 10646 still is 31 bit. But Unicode is taking over the world. Probably this will also happen in the Unix/Plan9 world. Best regards keld -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
