Dnia wto 8. lipca 2003 05:22, Wu Yongwei napisał:
> Is it true that "Almost all modern software that supports Unicode,
> especially software that supports it well, does so using 16-bit Unicode
> internally: Windows and all Microsoft applications (Office etc.), Java,
> MacOS X and its applications, ECMAScript/JavaScript/JScript, Python,
> Rosette, ICU, C#, XML DOM, KDE/Qt, Opera, Mozilla/NetScape,
> OpenOffice/StarOffice, ... "?
Do they support characters above U+FFFF as fully as others? For Python I know
that it's not the case because the programmer must decode UTF-16 himself if
he wants to do anything with them. In C# it's even worse because the char
type represents only 16 bits. In all languages here the programmer must deal
with UTF-16, it's not an internal representation detail, so it burdens
everyone.
I bet most of them just ignore the fact that Unicode no longer fits in 16
bits, or try to workaround it with pains and bugs and ugly APIs.
> Marcin, don't you like the fact that ICU is there for use if you use
> internally UTF-16 (no need to create the wheel again)?
I don't know. Perhaps I will interface to ICU in some optional library which
provides Unicode-related algorithms; I don't want to depend on it by default.
I don't want to use UTF-16 because it combines the worst aspects of UTF-8 and
UTF-32: variable-length codepoint and not being ASCII-compatible or as
compact as ASCII.
If you want to work with sequences of code points, use UTF-32. If you want to
represent text internally or represent it compactly, use UTF-8. There is no
reason to use UTF-16 except for compatibility with those which already use
it.
--
__("< Marcin Kowalczyk
\__/ [EMAIL PROTECTED]
^^ http://qrnik.knm.org.pl/~qrczak/
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/