On Saturday 06 April 2002 07:02, Tomohiro KUBOTA wrote:
> At Sat, 6 Apr 2002 19:16:50 +0900 (JST), Gaspar Sinai wrote:
> > I would like to have local encoding -> Unicode  character
> > mappings published by the Consortium - otherwise I am not
> > convinced that Unicode is supporting characters in national
> > standards - which may question its usability.
>
> I think so.  You may be surprised to hear that Unicode
> Consortium does not release Unicode <-> JIS X 0208 mapping
> table.  There are at least several incompatible mapping tables
> and I have sent mail to Unicode Consortium to deal with this
> problem for a few times.

I don't see a problem with documenting the various vendor character 
mappings, and labelling them as such, and creating separate, not 
necessarily consistent, Unicode mappings for JIS X 
{208-{1978|1983|1990|1997} | {212-{1990|1992}}, and the other JIS 
character standards. However, there does not appear to be a way we 
can get *one* authoritative *and* correct mapping of each of the 
components of this misch-masch any time soon.  Certainly if Unicode 
and ISO tried it again, there would be howls of outrage from those 
who already see Unicode/ISO 10646 as an attack on Japanese culture. 
At the moment, we can't even fix the mapping from JIS-Roman Yen 
character to ASCII backslash (Unicode REVERSE SOLIDUS) rather than to 
Unicode YEN SIGN that pollutes the Microsoft CJK mappings and fonts.

Does anybody know what the iconv developers have in mind for this? 
The man page at 
http://www.research.att.com/sw/tools/uwin/man/man1/iconv.html 
lists shift-jis, shift_jis, euc-jp, 
Extended_UNIX_Code_Packed_Format_for_Japanese, x-euc-jp, x-sjis, 
_iso-2022-jp, but does not say what JIS standards are supported in 
these encodings. 

I understand that each of SJIS, EUC-JP, and ISO-2022-JP *encodings* 
covers multiple JIS *character sets*. I haven't read anything 
that explains the situation in a way that I can understand 
sufficiently, except that (if I understand CJKV Information 
Processing correctly) JIS X 0208:1997 can be coded in each method, 
and JIS X 0212-1990 cannot be encoded in SJIS or ISO-2022-JP (RFC 
1468), but can be in ISO-2022-JP-1 (RFC 2237) and ISO-2022-JP-2 
(RFC1554). 

I object to having to know this much. I *personally* would prefer to 
have something usable than to worry about perfection. The bottom line 
for me is that I create Japanese documents only in software that can 
save them in Unicode files. I have had far more than enough of 
incompatible encodings (for Japanese, Chinese, Russian, APL...) in my 
career.
-- 
Edward Cherlin
Generalist
"A knot! Oh, do let me help to undo it!"
--Alice in Wonderland
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to