On Sun, 9 Sep 2001, Tomohiro KUBOTA wrote:
> However, there is an another problem that Unicode Consortium
> has abolished all EastAsian cross mapping tables.

I thought they were just transferred to ICU and that the converter
behaviour of ICU is supposed to become some form of reference standard
eventually.

In any case, the Unicode Consortium still maintains the Unihan database,
which is the actual official and up-to-date table for cross reference
information to the Han/Kanji part of CJK character sets. It is just
slightly more difficult to process as it's information is more
sophisticated than just a simple one-to-one table. The availability of the
naive EastAsian tables kept inexperienced coding conversion writers from
using the Unihan information properly for generating the required
many-to-one mapping tables.

Also, doesn't JIS X 0221 contain the really "official" one-to-one mapping
table already? Do you have a copy of JIS X 0221?

> http://www.debian.or.jp/~kubota/unicode-symbols.html

It's not as serious a problem as you make it sound. It would be nice if
you could add to the above page a link to

  http://www.cl.cam.ac.uk/~mgk25/unicode.html#conv

which explains the fairly simple solution to pretty much all of the
problems you point out, namely the generation of many-to-one mapping
tables from Unicode database normalization information and the Unihan
database. What you write about is more a missunderstanding of the purpose
of the old mapping tables than an actual bug.

Unfortunately, iconv doesn't do that yet properly normalization-invariant
many-to-one conversion either. Volunteers welcome.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to