> From: [EMAIL PROTECTED]
> Im am curious as to why UTF-16 was chosen (while being aware that
> im probably not going to change anyones mind).
We could have chosen any of UTF-8, UTF-16, or UCS4(at that time, there
was no UTF-32) actually, and probably it does not matter much to me.
Since one of the goals is to design the IM protocol usable over slow
modem connection, needing 1.5 times begger footprint for BMP Unified
Ideographs was a factor to consider. So we happen to chose UTF-16.
> While UTF-16 may make integration easier with Windows and Java, it
> has many downsides compared to either UTF-8 or UTF-32.
While I agree with your listings of downsides, those appeared in some
sense minor downsides to me when we designed IIIMF.
As far as we maintain the project members know Unicode well, such downside
was a piece of cake to resolve in actual code, while it is difficult
to reduce the 3 bytes of BMP Unified Ideographs to 2 bytes.
> UTF-16 is a variable size encoding, while UTF-32 is fixed
Not really ;-). Imagine the use of in=band language tag, inter linior
annotation, variation selectors, grapheme joiner, etc. Regrdless of
UTF-8/16/32, you have to program textstream as if variable
length. Surrogate handling is just a miner addition to them.
--
hiura@{freestandards.org,li18nux.org,unicode.org,sun.com}
Chair, Li18nux/Linux Internationalization Initiative, http://www.li18nux.org
Board of Directors, Free Standards Group, http://www.freestandards.org
Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA eFAX: 509-693-8356
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/