[Fonts]Han unification(SC and TC)(was..Re: Automatic 'lang' determination)

Jungshik Shin Sat, 29 Jun 2002 11:26:21 -0700

On Sat, 29 Jun 2002, Keith Packard wrote:

Ooops. My message crossed yours in mail :-)

> Around 9 o'clock on Jun 29, Jungshik Shin wrote:

> > IMHO, most problems with Han Unification arise not from using a _single_
> > font targeted at one of zh_TW/zh_CN/ja/ko to render a run of text in
> > another but from mixing _multiple_ fonts (with _drastically different_
> > design principle and other differences like baseline) to render a single
...

> Yes, I agree -- this is true in Western languages as well where the
....

  We agree with each other on this point, but still get to different
conclusions about zh-CN and zh-TW. I'm afraid that's because you have
been misinformed about what Han unification has done about simplified
forms and traditional forms of Chinese characters.

> > Suppose there's a document tagged as zh_TW that explains how PRC government
> > simplified Chinese characters to boost the literacy rate after WW II. If a
> > Big5 font (that doesn't cover all characters in the doc) is selected
> > instead of a GBK/GB18030 font (with the full coverage), simplified Han
> > characters(not used in Taiwan but only used in PRC) in the doc have to be
> > rendered with another font (most likely GB2312/GBK/GB18030 font).
>
> A correct version of this document would tag individual sections of the
> document with appropriate tags.  This way, the zh_TW sections could be
> presented in a traditional Chinese font while the mainland portions are
> displayed with simplified Chinese glyphs.

  Well, even without language tagging, that would happen, which
I regard as _ugly_ for the reason I gave in my previous message.
Language tag or not, the result would be just as ugly as using TimesRoman
Latin-1 font for most characters with a couple of characters rendered with
Palatino Latin-2 font.  My hypothetical document would not have separate
sections for zh-TW and zh-CN, but rather occasional simplified forms of
Chinese characters (absent in Big5 fonts but present in GB2312/GBK/GB18030
fonts) would pop up among traditional forms of Chinese characters
(present in _both_ Big5 font and GBK/GB18030 fonts).

  IMHO, tagging the whole document as 'zh-TW' is perfectly valid
and rendering it with GBK/GB18030 (with the full coverage of characters
in the document) is better than mixing two fonts, one with Big5 coverage
and the other with GBK/GB18030 coverage. The latter would happen if you
exclude GBK/GB 18030 fonts for zh-TW text rendering.

  Tagging individual simplified forms of Chinese characters
with 'lang=zh-CN' in the sea of traditional forms of Chinese characters
would only lead to a less-desirable result than otherwise possible.

> >  I'm not sure what you meant by 'glyph forms are more likely
> > simplified'. You might have misunderstood some aspects of Han Unification
> > in Unicode/10646.  In Unicode, simplified forms of Chinese characters are
> > NOT unified with corresponding traditional forms of Chinese characters.
>
> You're right -- I didn't believe this to be the case.  I had heard that the
> unified portion within the BMP do co-mingle simplified and traditional
> forms, but that the non-BMP Han extension provide separate codepoints for
> each.

  I'm afraid what you have heard of BMP section is misleading if
I understood you correctly. Whether in BMP or not, simplified forms of
Chinese characters are NOT UNIFIED with traditional forms of Chinese
characters. (let me copy my message to John H. Jenkins @Apple who knows a
lot more about Han Unification than I do.)  AFAIK, most complaints about
Han unification does NOT come from zh-CN vs zh-TW BUT from zh-CN/zh-TW
vs ja. For Han characters common in both zh-CN and zh-TW, there's no
significant difference in appearence between zh-CN and zh-TW. Although
many Japanese would not agree with me, I don't think there's any
significant difference across CJKV.  (again, ISO 10646 Han chart is a
good reference along with ROC MOE's Han character variant dictionary at
http://140.111.1.40) To me, Han Unification should have gone further (not
less) in a sense and it's worrisome to me that non-BMP includes too many
glyph variants (a whole bunch of them coming from Korean Buddist text :
see http://www.sutra.re.kr)  that should have been unified in my eyes.

> If even BMP codepoints are separate,
> then it should be possible to create
> a large set of codepoints which could mark fonts as suitable for the
> display of simplified Chinese which are distinct from the set of
> codepoitns suitable for the display of traditional Chinese.   That would
> be nicer than my current kludge of marking any font suitable for
> traditional chinese as unsuitable for simplified Chinese.

How about this?

   if covers most of GB 18030
      good for both zh-CN and zh-TW
      (and possibly good for ko)
   elif covers most of GBK
      good for both zh-CN and zh-TW
      (and possibly good for ko)
      not good for ja
   elif covers most of Big5,
      good for zh-TW
      (and possibly good for ko)
      not good for ja
   elif covers most of GB2312
      good for zh-CN
      not good for ja
   elif covers most of KS X 1001 (and optionally KS X 1002),
      good for ko
      not good for ja
   elif covers most of JIS 0208/0212 (and optionally 0213)
      good for ja

  Jungshik Shin

_______________________________________________
Fonts mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/fonts

[Fonts]Han unification(SC and TC)(was..Re: Automatic 'lang' determination)

Reply via email to