Marcin 'Qrczak' Kowalczyk wrote on 2003-07-08: > Dnia wto 8. lipca 2003 05:22, Wu Yongwei napisał: > > > Is it true that "Almost all modern software that supports Unicode, > > especially software that supports it well, does so using 16-bit Unicode > > internally: Windows and all Microsoft applications (Office etc.), Java, > > MacOS X and its applications, ECMAScript/JavaScript/JScript, Python, > > Rosette, ICU, C#, XML DOM, KDE/Qt, Opera, Mozilla/NetScape, > > OpenOffice/StarOffice, ... "? > > Do they support characters above U+FFFF as fully as others? For Python I know > that it's not the case because the programmer must decode UTF-16 himself if > he wants to do anything with them. In C# it's even worse because the char > type represents only 16 bits. In all languages here the programmer must deal > with UTF-16, it's not an internal representation detail, so it burdens > everyone. > For the record, Python can be compiled for UTF-32 (``configure --with-unicode=ucs4``), it is so compiled on my machine ;-) and this is becoming a popular configuration (IIRC RedHat 9 ships with it out-of-the box). The UCS-2 mode is intended for those who care little about characters outside the BMP. UTF-16 support amounts to a few simple hacks: unicode literals create/decode surrogate pairs automatically and built-in codecs were explicitly made to understand them. There is no attempt to hide UTF-16 from the Python programmer and AFAIK no there are no plans to waste time on it. Those who care much about high characters just use UCS-4 builds and eventually it will probably become the default mode.
From a UCS-2 build: >>> len(u'\U0000FFFF') 1 >>> len(u'\U00010000') 2 >>> import sys >>> sys.maxunicode 65535 -- Beni Cherniavsky <[EMAIL PROTECTED]> If I don't hack on it, who will? And if I don't GPL it, what am I? And if it itches, why not now? [With apologies to Hilel ;] -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
