Gaspar Sinai writes:
> I also would like to re-iterate that Unihan-3.2.0.txt
> is not the right source to build these font maps
Unihan-3.2.0.txt is certainly the most authoritative source for the
Han part of the mapping table.
> o JIS-0201 is totally undefined
The JIS0201 mapping to Shift_JIS and EUC-JP is well-known for years,
and not subject to debate.
> o Thousands of JIS->UNICODE codepoints are missing.
> A `grep kJIS0213 Unihan-3.2.0.txt | wc -l` may prove it
> (It should be more than 11 thousand instead of the misetable 3627).
> For instance (just to pick a missing one):
> JIS X 0213 1-16-17 (Plane2 0x2435) -> U+8466
These are the JISX0208 characters. You know that JISX0213 plane 1 is
an extension of JISX0208, therefore there is no need to mention these
characters twice in Unihan-3.2.0.txt, once for 0208 and once for 0213.
> o There are errors:
> Unihan-3.2.0.txt:
> U+9B1C kJIS0213 2,93,27
> Should be:
> U+9B1D kJIS0213 2,93,27
Also there are two oddities:
- 0x12B65 (1-11-69) is unmapped in Unicode
- There is a duplicate: 0x1745C (1-84-60) and 0x17624 (1-86-4) map
to the same character U+FA3E.
I have created a mapping table for JISX0213 to Unicode in
ftp://ftp.ilog.fr/pub/Users/haible/jisx0213/
from the following sources:
- Unihan-3.2.0.txt
- the Mule-UCS sources
- creative use of sed, grep, join
- ISO-IR 228, 229 and a Shift_JISX0213 chart
- the Unicode 3.2 glyph charts
Bruno
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/