Gaspar Sinai writes:

> I also would like to re-iterate that Unihan-3.2.0.txt
> is not the right source to build these font maps

Unihan-3.2.0.txt is certainly the most authoritative source for the
Han part of the mapping table.

> o JIS-0201 is totally undefined

The JIS0201 mapping to Shift_JIS and EUC-JP is well-known for years,
and not subject to debate.

> o Thousands of JIS->UNICODE codepoints are missing.
>   A `grep kJIS0213  Unihan-3.2.0.txt | wc -l` may prove it
>   (It should be more than 11 thousand instead of the misetable 3627).
>   For instance (just to pick a missing one):
>   JIS X 0213 1-16-17 (Plane2 0x2435) -> U+8466

These are the JISX0208 characters. You know that JISX0213 plane 1 is
an extension of JISX0208, therefore there is no need to mention these
characters twice in Unihan-3.2.0.txt, once for 0208 and once for 0213.

> o There are errors:
>   Unihan-3.2.0.txt:
>   U+9B1C  kJIS0213        2,93,27
>   Should be:
>   U+9B1D  kJIS0213        2,93,27

Also there are two oddities:
  - 0x12B65 (1-11-69) is unmapped in Unicode
  - There is a duplicate: 0x1745C (1-84-60) and 0x17624 (1-86-4) map
    to the same character U+FA3E.

I have created a mapping table for JISX0213 to Unicode in
   ftp://ftp.ilog.fr/pub/Users/haible/jisx0213/

from the following sources:
  - Unihan-3.2.0.txt
  - the Mule-UCS sources
  - creative use of sed, grep, join
  - ISO-IR 228, 229 and a Shift_JISX0213 chart
  - the Unicode 3.2 glyph charts

Bruno
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to