Markus Kuhn wrote:
>
> Do not include:
>
> a) character encodings that contain ASCII bytes in non-ASCII multibyte
> sequences (so BIG5, GB 18030, SJIS are not qualified I'm afraid, you
> really should use UTF-8 or EUC-* instead)
>
> b) character encodings that are not listed in the IANA registry
> under the proposed name as the preferred MIME name (so EUC-TW
> as described in Ken Lunde's book is not qualified unfortunately,
> and EUC-CN has to be called GB2312)
These two criteria effectively banish Traditional Chinese in zh_TW,
which uses either the de facto industry standard Big5, or less
preferably,
EUC-TW. Big5 fails criterion a while passing b; EUC-TW passes a while
failing b (its not even in the IANA registry). I suspect the lesser use
of EUC-TW was the reason why no one bothered to register it with IANA.
(Even if EUC-TW were not banned by criteria b, e.g., someone registers
it, there isn't established practice similar to the Japanese case where
Shift-JIS and EUC-JP have co-existed more equally, e.g., applications
that understand both and users accustomed to dealing with multiple
encodings, such that an expedient switch can be made from Big5 to
EUC-TW when technically necessary.)
Similarly for Traditional Chinese in zh_HK, which when not using Big5
uses Big5-HKSCS, where Big5-HKSCS fails criterion a (similar reasons as
Big5) and b (although listed in IANA registry, it is not a preferred
MIME name). (And there is no EUC-type equivalent/superset to
Big5-HKSCS,
btw.)
That leaves only UTF-8 to handle those two cases of Traditional Chinese,
under those conditions. Yes, we all know its better, but changes
don't happen overnight. Furthermore, Big5-HKSCS still can't be
desirably represented in UTF-8, without using several hundred PUA
codepoints (as of Unicode 3.1).
P.S. GB18030 seems to be getting a lot of attention, perhaps because its
new and is on a lot of people's minds (and also fails criterion a and
b),
but its predecessor GBK (a lot more real than GB18030) can also be
raised
as an example that fails criteria a, along with Big5, Shift-JIS, etc.
But unlike Big5 and Shift-JIS, which can be replaced with EUC-type
encodings, GBK is like GB18030 in that it can't be replaced with such.
Thomas Chan
[EMAIL PROTECTED]
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/