On Sat, 30 Mar 2002, Gaspar Sinai wrote:

> I noticed that at ftp.unicode.org /Public/MAPPINGS/EASTASIA
> has been moved to OBSOLETE directory. README.TXT reads:
>   The entire former contents of this directory are obsolete
>   and have been moved to the OBSOLETE directory.  The latest
>   information may be found in the Unihan.txt file in the latest
>   Unicode Character Database.
>   August 1, 2001.
> I looked at Unihan.txt file but I found no way to extract
> GB2312.TXT JIS0208.TXT JIS0212.TXT KSC5601.TXT (KSX1001.TXT?)
> OLD5601.TXT and JIS0201.TXT files.

  KSC5601.TXT in OBSOLETE/EASTASIA is NOT the mapping
between Unicode and KS C 5601-1987 but the mapping between MS CP949 and
Unicode (sans US-ASCII portion).  OLD5601.TXT is the mapping between
KS C 5601-1987(KS C 5601-1992 and KS X 1001:1997) and Unicode 2.0. So
is KSX1001.TXT.

> For instance:
> JIS0201.TXT:
> cd Public/UNIDATA
> grep -i FF71 *.* | grep -i B1
> proves that neither Unihan.txt nor any of the other UNIDATA
> files can be used to generate JIS0201.TXT.
> The question is: What is the best source for these maps?
> Is there a place where they are centrally maintained?

  You can extract two different mappings between EUC-KR
and Unicode from CP949.TXT (in VENDORS/MICSFT/) and KOREAN.TXT
(in VENDORS/APPLE).  Just filter out non-EUC portion and keep EUC
codepoints only (that is, 0x00-0x7E for single byte characters and
[0xA1-0xFE][0xA1-0xFE] for double byte characters). If you want
the mapping KS X 1001 and Unicode, you can subtract 0x8080 from
codepoints of two byte characters in EUC-KR.  I've put them up

   http://jshin.net/faq/KSX1001.TXT.gz  (extracted from CP949.TXT)
   http://jshin.net/faq/JOHAB.TXT.gz    (for Johab)

 The difference between two mappings are well explained in Apple's
Korean mapping table, KOREAN.TXT
Another difference is that Apple's Korean mapping doesn't have two new
characters added to KS X 1001 in December, 1998.  They're EURO SIGN
(U+20AC) at row 2 column 70 (0xA2E6 in EUC-KR and 0x2266 in ISO-2022-KR)
and REGISTERED SIGN (U+00AE) at row 2 column 71 (0xA2E7 in EUC-KR and
0x2267 in ISO-2022-KR).  Glibc and libiconv have already added them.

   Jungshik Shin

Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to