Marcin 'Qrczak' Kowalczyk wrote on 2003-07-08:

> Dnia wto 8. lipca 2003 05:22, Wu Yongwei napisał:
>
> > Is it true that "Almost all modern software that supports Unicode,
> > especially software that supports it well, does so using 16-bit Unicode
> > internally: Windows and all Microsoft applications (Office etc.), Java,
> > MacOS X and its applications, ECMAScript/JavaScript/JScript, Python,
> > Rosette, ICU, C#, XML DOM, KDE/Qt, Opera, Mozilla/NetScape,
> > OpenOffice/StarOffice, ... "?
>
> Do they support characters above U+FFFF as fully as others? For Python I know
> that it's not the case because the programmer must decode UTF-16 himself if
> he wants to do anything with them. In C# it's even worse because the char
> type represents only 16 bits. In all languages here the programmer must deal
> with UTF-16, it's not an internal representation detail, so it burdens
> everyone.
>
For the record, Python can be compiled for UTF-32 (``configure
--with-unicode=ucs4``), it is so compiled on my machine ;-) and this
is becoming a popular configuration (IIRC RedHat 9 ships with it
out-of-the box).  The UCS-2 mode is intended for those who care little
about characters outside the BMP.  UTF-16 support amounts to a few
simple hacks: unicode literals create/decode surrogate pairs
automatically and built-in codecs were explicitly made to understand
them.  There is no attempt to hide UTF-16 from the Python programmer
and AFAIK no there are no plans to waste time on it.  Those who care
much about high characters just use UCS-4 builds and eventually it
will probably become the default mode.

From a UCS-2 build:

>>> len(u'\U0000FFFF')
1
>>> len(u'\U00010000')
2
>>> import sys
>>> sys.maxunicode
65535

-- 
Beni Cherniavsky <[EMAIL PROTECTED]>

If I don't hack on it, who will?  And if I don't GPL it, what am I?
And if it itches, why not now?  [With apologies to Hilel ;]
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to