# Note: the entire IIIMF is hardwired to UTF-16, including
# server, language engines, all client side libraries/shared objects,
# and IIIM protocol. Whenever other encodings are required, it uses
# codeset converter, as seen in some client side code, such as in
# xiiimp.so in Xlib or XBackend. 

Im am curious as to why UTF-16 was chosen (while being aware that
im probably not going to change anyones mind). While UTF-16 may
make integration easier with Windows and Java, it has many
downsides compared to either UTF-8 or UTF-32.

UTF-16 is a variable size encoding, while UTF-32 is fixed
UTF-16 is machine byte order sensitive, while UTF-8 is independant
UTF-16 is sensitive to word alignment, while UTF-8 is not
UTF-16 is limited to characters up to 0x10FFFF, while both UTF-8
        and UTF-32 have room to expand beyond that should the need arise
UTF-16 is not backwards compatible with any existing ascii-using
        software (compilers,libraries,protocols(HTTP headers,etc))
        which must use utf-8 or else be completely changed/rewritten
UTF-16 breaks integer sorting, while both UTF-8 and UTF-32 sort
        naturally by unicode index, Example:
      UTF-32       UTF-8                UTF-16
        0xF012       0xEF 0x80 0x92       0xF012
        0x10010      0xF0 0x90 0x80 0x90  0xD800 0xDC10



UTF-16 appears to have the worst characteristics of either alternative,
and none of their benefits. The overhead for converting unicode is
relatively trivial, however it is cumulative with any other
inefficiencies. Either UTF-32 or UTF-8 would be vastly more suitable
for virtually any role.



# BTW, "locale", "language" and "codeset" are three distinctive
# things. Those terms are not interchangable.
# I have said "hardwired to specific codeset", or "CSD - CodeSet
# Dependent" vs "CSI - CodeSet Independent", as different approaches
# for I18N, but I have not said as Kai summarized.

Another: keyboard layout and Input method are separate as well.
(A mistake made by MS's touted global IME: if you arrange your
keys into dvorak/etc, then it wont work with any other IM beyond
the default)
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to