Re: Unicode is optimal for Chinese/Japanese multilingual texts

Ienup Sung Mon, 05 Feb 2001 12:21:27 -0800
I think many people emphasized that computer software should support
the glyph variations feature in their software by using and/or depending on
their current locale or whatever mechanism available including
font/fontset localization for simpler applications to inbound/outbound
tagged text. And, I am glad that you are also agreeing on the fact that
the glyph variations is a reality and need to be supported whether that is
just a font remapping or not.

Also, I would like to point out that yes, even though the Plane 14 was
initially a reaction to now-I-don't-remember-what-was-that-draft-RFC, if
I remember correctly, I think such tagging can actually benefit future
software since it can give hint on what kind of languages are being used even
in a flat text file. How much will we be able to exploit that feature of
Unicode in future and also for now? I have no data for that but I hope & think
it's not really a complete waste of coding space and/or energy of us but
can be a very good thing in my opinion. For instance, software could do
better sorting with the language tag and also it give hints on what kind of
glyph should I use when I have an assorted variety of glyphs. Maybe someone
will argue opposite idea on this and that's fine with me.

I also would like to point out that while I'm not exactly a fan of ISO 2022
codeset extension mechanism, whether one like it or not, there are quite
significant amount of text data in codesets created by using the ISO 2022 and
other codeset extension mechanism and that's the reality we have. And,
everyone knows what such "significant volume" means, esp., in business.

With regards,

Ienup


] X-URL: http://www.cl.cam.ac.uk/~mgk25/
] Date: Sun, 04 Feb 2001 15:15:02 +0000
] From: Markus Kuhn <[EMAIL PROTECTED]>
] Subject: Re: Unicode is optimal for Chinese/Japanese multilingual texts
] To: [EMAIL PROTECTED]
] MIME-version: 1.0
] 
] Thomas Chan wrote on 2001-02-03 20:27 UTC:
] > For the vast majority of Japanese users, there is no issue, since they
] > will be using Japanese (language) exclusively, and can use a Japanese
] > font.  The problem is with Japanese users who are dealing with
] > multilingual texts and want to make an artificial segregation based on
] > some unclear criteria (country? language? time period? character set?).
] 
] Well, they will have to select a font! Trivial. Just like we have to
] select in any word processing document whether we want a phrase to be
] typeset in Times Roman, Times Bold Italic, Courier, or Helvetica Narrow.
] It's not rocket science, it has been common practice to annotate text
] documents with font specifications in files since the late 1950s when
] phototypesetters were first connected to computers!
] 
] I write in the same document (my thesis) English text in a Roman font,
] Latin insets in italic, and computer source code in a Courier font. I do
] this multi-linugal processing in ASCII and this is really *EXACTLY* just
] the same as a Japanese/Chinese multilingual text.
] 
] What the Japanese geeks who complain about Unicode's Han unification
] haven't understood is simply (and I repeat this for the n-th time now
] here, therefore everyone please excuse my slightly impatient and annoyed
] tone) is that ISO 2022 is *not* a font selection mechanism and that they
] have just been abusing it as such so far. Nobody outside Japan will
] support that abuse of an encoding selection hack for font style
] selection (except for a few poor i18n engineers brainwashed by marketing
] departments who make them believe that the customers believe they
] actually want this ISO 2022 mess).
] 
] If you need font selection, then build in font selection. For example
] <FONT> in HTML, etc. etc.
] 
] Language markup is completely orthogonal to font markup as well.
] Language markup/tagging is useful and urgently needed for correct
] paragraph formatting, hyphenation, spell checking, sorting, etc. It
] really should have been outside the scope of a character coding standard
] like Unicode to handle language tags and it was a quite ugly politically
] motivated compromise intended to shut up Japanese ISO 2022 fanatics that
] Plane 14 was added in the first place. Fortunately, both W3C and
] Microsoft have decided not to use them in their formats. HTML, Word.doc,
] and other common text formats have already proper orthogonal font and
] language tagging and won't need any Plane 14 and variant glyph hacks.
] 
] The Japanese geeks got too used to a tiny ISO 2022 subset being abused
] as a mini-rich-text-format. If you want to have rich text functionality,
] then please start to use a proper rich text format (HTML, RTF, Word.doc,
] MIME text/ rich, Emacs rich text, etc.) instead and make sure that these
] formats fulfil your typographic needs. Please don't mess around with the
] character encoding as a (stateful!) rich text infrastructure. The
] introduction of ISO 10646 will hopefully move things into the right
] direction and people will start to treat language and font tagging as
] equally important and mostly independent issues and will recognize
] encoding tagging as historic ballast that's best forgotten about
] quickly.
] 
] Markus
] 
] -- 
] Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
] Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>
] 
] -
] Linux-UTF8:   i18n of Linux on all levels
] Archive:      http://mail.nl.linux.org/lists/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: Unicode is optimal for Chinese/Japanese multilingual texts

Reply via email to