Re: EUC-JP <-> Unicode roundtrip compatibility

Tomohiro KUBOTA Thu, 12 Apr 2001 05:50:28 -0700
Hi,

At Wed, 11 Apr 2001 13:55:39 +0100,
Markus Kuhn <[EMAIL PROTECTED]> wrote:

> Ah, I see. While the Unicode mapping tables provide round-trip
> compatibility to both JIS X 0208 and ASCII individually, they do not
> provide round-trip compatibility to an encoding such as EUC-JP that
> distinguishes between every element of JIS X 0208 and ASCII.
> 
> I think, there is an easy and straight forward solution out of this:
> 
> If (and only if) you map EUC-JP to Unicode, just replace in the JIS X
> 0208 mapping table the above line with
> 
>       0x815F  0x2140  0xFF3C  # FULLWIDTH REVERSE SOLIDUS
> 
> EUC-JP 0xA1 0xC0 maps to U+FF3C FULLWIDTH REVERSE SOLIDUS.

Then, how about converting EUC-JP 0xA1 0xC0 into Shift_JIS
(or other CES with JIS X 0208 such as CP932)?  (or vise versa?)
It is apparent that JIS X 0208 characters in EUC-JP must be
mapped into same JIS X 0208 character in Shift_JIS.  This
means that 0x2140 in Shift_JIS (0x81 0x5F) also must be mapped
into U+FF3C, though 0x005C (REVERSE SOLIDUS) is never used.
This is an another seeds of discord, because usage of fullwidth
form is discouraged.

IMO, there cannot be an easy and straightforward solution.
Sorry for non-constrictive discussion, but I really cannot
imagine there exist any satisfying solution.


> This seems to be the suitable corresponding Unicode character if you
> need round-trip compatibility between Unicode and JIS X 0208 + ASCII.
> 
> I think, on POSIX systems you most definitely want in a Unicode to
> EUC-JP conversion to map U+005C into the ASCII 0x5C, because lots of
> software assigns special semantics to this character (e.g., C string
> syntax "\n", etc.). It was my understanding that this is what glibc does
> (and its regression test suite even enforces) anyway.

Yes, this is what CP932 does.  C language is also used for DOS and
Windows.  (In DOS, we can write printf("\%d\n",money); where "\" is
yen mark.  The first "\" means "yen" where second "\" is escape character.)


> If you do a Unicode to JIS X 0208 (not in the context of EUC-JP)
> conversion, *both* U+005C *and* U+FF3C should be mapped onto JIS X 0208
> code point 0x2140. This way you never loose anything.

Hmm, it will work if it can be implemented.  (However, this means
that conversion table cannot be written in one-to-one correspondence.
Not only all softwares have to be written, but also Unicode's concept
of conversion has to be redefined.)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: EUC-JP <-> Unicode roundtrip compatibility

Reply via email to