RE: Red Hat 8 now uses UTF-8 by default for all non-CJK users

Kent Karlsson Wed, 27 Nov 2002 07:33:26 -0800

> Jason Maiorana wrote:
> > > Also, there is the problem that its really not possible to show both
> > > chinese and japanese together in a simple text document, because no
> > > one font can show chinese, simplified chinese, and japanese all at
> > > once.


[Traditional] Chinese and Simplified Chinese are not unified, except when
a "simplified" character was chosen in such a way that it coinsided with
some "traditional" character. So yes, a Unicode font can easily display
Simplified and Traditional Chinese in a single plain text document.  As
for the supposed difference between Japanese and Chinese, it rather seems
to be a matter of typographic  preference, not language of the text.
So *language* tags (of whatever form) do not help.

Furthermore, the IRG has been, and still is, busy adding Han variants
to encode.  I cannot analyse their proposals, so I cannot tell what
is variant of what.  If you really want a particular variant, go look
in extension B, or in the upcoming extension C...  Also lurking in the
wings are "variant selectors", anticipating more variants, but that
they should not be given separate characters, but use "variant selectors"
instead.

Finally, the Unicode consortium has started pondering on "normalisation
tailoring", since some find the canonical mappings of some Han characters
"unhelpful".

> No single font can, and that's why these language tags have been added
> to Unicode 3.2.

Most definitely not.  Not only were they added earlier, but for a completely
different reason: to fight off a modified-UTF-8 proposal that threatened
to destabilise UTF-8 as an encoding.  THAT was the reason, none other.
Not even those who proposed the malformed modifed-UTF-8, and were pacified
with the "plane 14" tag "characters", have picked up the idea.  (They
planned to use it for some protocol, but a protocol is syntax (among other
things), so language tagging can be done using ASCII characters...). So the
plane 14 "characters" have served their purpose, and should be turned to
rest (deprecated).

This would not deprecate language tagging per se, of course.  Language
tags are fine to have in "higher level protocols", like XML (or
something simpler).  Helpful for spell checking, automatic hyphenation,
perphaps language selection (among alternatives); but not at all helpful
for glyph selection, despite many such claims.

                /kent k

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

RE: Red Hat 8 now uses UTF-8 by default for all non-CJK users

Reply via email to