Dnia wto 8. lipca 2003 05:22, Wu Yongwei napisał:

> Is it true that "Almost all modern software that supports Unicode,
> especially software that supports it well, does so using 16-bit Unicode
> internally: Windows and all Microsoft applications (Office etc.), Java,
> MacOS X and its applications, ECMAScript/JavaScript/JScript, Python,
> Rosette, ICU, C#, XML DOM, KDE/Qt, Opera, Mozilla/NetScape,
> OpenOffice/StarOffice, ... "?

Do they support characters above U+FFFF as fully as others? For Python I know 
that it's not the case because the programmer must decode UTF-16 himself if 
he wants to do anything with them. In C# it's even worse because the char 
type represents only 16 bits. In all languages here the programmer must deal 
with UTF-16, it's not an internal representation detail, so it burdens 
everyone.

I bet most of them just ignore the fact that Unicode no longer fits in 16 
bits, or try to workaround it with pains and bugs and ugly APIs.

> Marcin, don't you like the fact that ICU is there for use if you use
> internally UTF-16 (no need to create the wheel again)?

I don't know. Perhaps I will interface to ICU in some optional library which 
provides Unicode-related algorithms; I don't want to depend on it by default.

I don't want to use UTF-16 because it combines the worst aspects of UTF-8 and 
UTF-32: variable-length codepoint and not being ASCII-compatible or as 
compact as ASCII.

If you want to work with sequences of code points, use UTF-32. If you want to 
represent text internally or represent it compactly, use UTF-8. There is no 
reason to use UTF-16 except for compatibility with those which already use 
it.

-- 
   __("<         Marcin Kowalczyk
   \__/       [EMAIL PROTECTED]
    ^^     http://qrnik.knm.org.pl/~qrczak/

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to