Fri, 13 Apr 2001 10:33:17 +0100,
Markus Kuhn <[EMAIL PROTECTED]> wrote:
> There is a bug in
>   http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT
> that causes round-trip compatibility problems if this table is used to
> convert EUC-JP into Unicode and back.
> Suggested fix: Replace in JIS0208.TXT the line
>   0x815F  0x2140  0x005C  # REVERSE SOLIDUS
> with
>   0x815F  0x2140  0xFF3C  # FULLWIDTH REVERSE SOLIDUS

I found a same problem in

  http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0212.TXT .

If JISX0212 is used as a coded character set(CCS) of EUC-JP,
ISO-2022-JP-1, or ISO-2022-JP-2, replace in JIS0212.TXT the line

  0x2237  0x007E  # TILDE

with

  0x2237  0xFF5E  # FULLWIDTH TILDE

Note:
* The new Japanese standard JISX0213 has also TILDE.
  Converters which will support JISX0213 should similarly care.
* The CP932 code position 0x8160 (WAVE DASH, 0x301C of JISX0208)
  is converted to U+FF5E (FULLWIDTH TILDE).
    http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
  It is completely a BUG. But, it will not be changed. ;-(
  When Unicode is converted to Japanese lagacy charsets which are
  not have JISX0212 or JISX0213, it is useful to convert U+FF5E to
  0x301C of JISX0208.

> I have not been able to check, what JIS X 0221-1995 says here, but I
> hope that they haven't made the same mistake.

These problem are not bugs, and JIS did not make mistakes.
JISX0208.TXT and JISX0212.TXT are right as mapping tables of JIS family.
But, it is fact that they confuse us.
Mapping tables for CES (EUC-JP.TXT, ISO-2022-JP.TXT, and so on) should
be prepared instead of tables for CCS (JISX0208.TXT and JISX0212.TXT).

At Thu, 12 Apr 2001 22:09:19 +0900,
Tomohiro KUBOTA <[EMAIL PROTECTED]> wrote:
> One harm from unuseful (but compliant to principle) conversion table
> (i.e., Unicode Consortium's JISX0208.TXT) is that it caused vendors
> to adopt various conversion tables (for EUC-JP).  Also there are
> many Shift_JIS variants (even though Unicode Consortium supplies
> SJIS.TXT.  I don't know why.  This might be historical and political
> origin).  It is too late to change this situation...  Thus, at least
> I hope free software world will have a common consistent conversion
> table....

I hope too. We have to have *a common consistent* conversion table.
To be *a common consistent* is more important than to be right or wrong.
If there are some tables, round-trip is never guaranteed.

Before Unicode, at least the round-trip about JISX0208 is completely
guaranteed. We(Japanese) did not feel happy about it, because it is
natural and a minimum function of all converters.
After Unicode, we need it as a minimum function even if a converter
is based on Unicode. Please make us unhappy...

Fri, 13 Apr 2001 10:33:17 +0100,
Markus Kuhn <[EMAIL PROTECTED]> wrote:
> I do understand that JIS X 0201 lacks the two ASCII characters U+005C
> REVERSE SOLIDUS and U+007E TILDE (places U+00A5 YEN SIGN and U+203E
> OVERLINE there instead), but this simply makes JIS X 0201 unsuitable for
> use on POSIX platforms and cannot be an excuse for squeezing one (then
> why not both?) of these two single-width characters into the JIS X 0208
> mapping table.

I agree. Microsoft's opinion is probably same, because
0x5C of CP932 (YEN SIGN) is converted to U+005C (REVERSE SOLIDUS) and
0x7E of CP932 (OVERLINE) is converted to U+007E (TILDE).

===
I'm sorry to say only Japanese charsets.
But, I think other many charsets bave same problems.
For example, CP936.TXT (CP936 is known as GBK) is not completely
compatible with GB2312.TXT.
  http://www.unicode.org/Public/MAPPINGS/EASTASIA/GB/GB2312.TXT
    0x2124  0x30FB  # KATAKANA MIDDLE DOT
    0x212A  0x2015  # HORIZONTAL BAR
  http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT
    0xA1A4  0x00B7  #MIDDLE DOT
    0xA1AA  0x2014  #EM DASH
I don't know how do Chinese solve this problem.
For another example, CP950.TXT (CP950 is a extention of Big5) has many
differences with BIG5.TXT. 
  http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
  http://www.unicode.org/Public/MAPPINGS/EASTASIA/OTHER/BIG5.TXT
I have heard Big5 has some variants and BIG5.TXT in Unicode.org is
broken, but I don't know how is this problem solved.

Thanks,
------------------------------------------- 
Hironori Sakamoto <[EMAIL PROTECTED]> 
 http://www2u.biglobe.ne.jp/~hsaka/

-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to