Thomas Chan wrote on 2001-02-03 20:27 UTC:
> For the vast majority of Japanese users, there is no issue, since they
> will be using Japanese (language) exclusively, and can use a Japanese
> font. The problem is with Japanese users who are dealing with
> multilingual texts and want to make an artificial segregation based on
> some unclear criteria (country? language? time period? character set?).
Well, they will have to select a font! Trivial. Just like we have to
select in any word processing document whether we want a phrase to be
typeset in Times Roman, Times Bold Italic, Courier, or Helvetica Narrow.
It's not rocket science, it has been common practice to annotate text
documents with font specifications in files since the late 1950s when
phototypesetters were first connected to computers!
I write in the same document (my thesis) English text in a Roman font,
Latin insets in italic, and computer source code in a Courier font. I do
this multi-linugal processing in ASCII and this is really *EXACTLY* just
the same as a Japanese/Chinese multilingual text.
What the Japanese geeks who complain about Unicode's Han unification
haven't understood is simply (and I repeat this for the n-th time now
here, therefore everyone please excuse my slightly impatient and annoyed
tone) is that ISO 2022 is *not* a font selection mechanism and that they
have just been abusing it as such so far. Nobody outside Japan will
support that abuse of an encoding selection hack for font style
selection (except for a few poor i18n engineers brainwashed by marketing
departments who make them believe that the customers believe they
actually want this ISO 2022 mess).
If you need font selection, then build in font selection. For example
<FONT> in HTML, etc. etc.
Language markup is completely orthogonal to font markup as well.
Language markup/tagging is useful and urgently needed for correct
paragraph formatting, hyphenation, spell checking, sorting, etc. It
really should have been outside the scope of a character coding standard
like Unicode to handle language tags and it was a quite ugly politically
motivated compromise intended to shut up Japanese ISO 2022 fanatics that
Plane 14 was added in the first place. Fortunately, both W3C and
Microsoft have decided not to use them in their formats. HTML, Word.doc,
and other common text formats have already proper orthogonal font and
language tagging and won't need any Plane 14 and variant glyph hacks.
The Japanese geeks got too used to a tiny ISO 2022 subset being abused
as a mini-rich-text-format. If you want to have rich text functionality,
then please start to use a proper rich text format (HTML, RTF, Word.doc,
MIME text/ rich, Emacs rich text, etc.) instead and make sure that these
formats fulfil your typographic needs. Please don't mess around with the
character encoding as a (stateful!) rich text infrastructure. The
introduction of ISO 10646 will hopefully move things into the right
direction and people will start to treat language and font tagging as
equally important and mostly independent issues and will recognize
encoding tagging as historic ballast that's best forgotten about
quickly.
Markus
--
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/>
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/