Is it true that "Almost all modern software that supports Unicode, especially software that supports it well, does so using 16-bit Unicode internally: Windows and all Microsoft applications (Office etc.), Java, MacOS X and its applications, ECMAScript/JavaScript/JScript, Python, Rosette, ICU, C#, XML DOM, KDE/Qt, Opera, Mozilla/NetScape, OpenOffice/StarOffice, ... "?
http://oss.software.ibm.com/pipermail/icu4c-support/2003-March/001300.html A discussion over the virtues of UTF-16 and UTF-32 since August 2000 in just this list: http://mail.nl.linux.org/linux-utf8/2000-08/msg00043.html Markus Scherer talks about the reasons of using 16-bit Unicode character type in ICU: http://oss.software.ibm.com/icu/archives/icu/icu.0001/msg00040.html This URL shows how to `fix' the UTF-16 comparison: http://oss.software.ibm.com/icu/docs/papers/utf16_code_point_order.html Other people have said much. I only want to stress that "performance is a deal" (author of Efficient C++). And efficiency is, too. Sacrificing things like flexilibility for performance is always a pain, but sometimes it is necessary because it meets the real needs of end users (not aesthetically from the programmer's point of view). More memory occupation could mean more swap, lower cache hits, and slower execution (sometimes by a magnitude). I don't think anyone likes this, either. Marcin, don't you like the fact that ICU is there for use if you use internally UTF-16 (no need to create the wheel again)? Hope this time it is not irrelevant. Best regards, Wu Yongwei -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
