Hi Mark -

Yeah, I know... Unfortunately, there's still quite a lot of such hearsay going on in Japan.

That's why I suggested that you get ahold of the Unicode book and read the section about Han unification. I suggest it again; it is an eye-opener. I should warn you that it's not light reading... :-)

I don't know if the new edition of the Unicode book mentions the plane 14 assignments for tags. It should, because plane 14 is the final nail in the coffin for RFC 1554 and similar ISO 2022 based schemes where you infer a mapping from character set to language. In Unicode, you *know* the language, and with much finer granularity.

HTML has also rendered most of the plaintext arguments moot. Most people today want richer handling for multilingual text than just a possible font shift with a change of language.

It really does make things much easier to write an application that uses Unicode (as UTF-16) internally, and then to treat the other character sets as input and output representations.

A dirty little secret that isn't discussed much (it isn't politically correct) is that the 16-bit Basic Multilingual Plain is good enough for almost all practical purposes including email. You can convert a JIS based email application to use UCS-2 and nobody would be the wiser. Now, you won't be able to handle obscure Han characters last used 2000 years ago, but your JIS based application couldn't do so either.

I certainly wouldn't recommend writing a new UCS-2 application today. New applications should use UTF-16 and fully support the 16 extension planes. However, for upgrading an old application, UCS-2 is likely to be good enough for several years.

Note as well that planes 1, 2, and 14 are the only extension planes that are currently being used and likely to be used for quite some time. Fans of 36-bit computers would be happy to know that they can compress UCS-4 into a halfword and not have that compression break for a long time! :-)

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

Reply via email to