Hi,

At Tue, 10 Apr 2001 17:02:58 +0100,
Markus Kuhn <[EMAIL PROTECTED]> wrote:

> I knew that this would be coming. Plan B:
> 
>   ja.UTF-8@eucwidth

Width is not specific to EUC-JP or other EUC-based encodings.



> > More conventions will be needed because of confusing situation
> > of many conversion tables between Unicode and local encodings.
> 
> Please detail. You have repeatedly mentioned problems with conversion
> tables without explaining a single one.

I see.  I have not explained in detail because this problem is so
complicated that I can hardly understand enough to explain to others.

So far, I introduce some web pages written in Japanese.
(You can read tables within the pages).

http://www.asahi-net.or.jp/~hc3j-tkg/unicode/index.html
http://hp.vector.co.jp/authors/VA001240/article/ucsnote.html
http://www.autumn.org/etc/unidif.html

Note these pages don't mention width problem.  The focus of these
pages is conflict between conversion tables.  However, the width
problem I am thinking about is derived from the conversion table
conflict problem.

Now I found we need a document written in English to explain
this problem.  Though I don't know I can do it, I will try.


> As far as I can tell, the
> relevant mapping and unihan tables on http://www.unicode.org/Public/ are
> 100% bug-free by definition, as they were used to print the Han columns
> in ISO 10646-1.

Saying about Unicode Consortium's conversion table, it is impossible
to construct round-trip compatible EUC-JP <-> UCS conversion table.
This is because

        http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT

has the following line:

        0x815F  0x2140  0x005C  # REVERSE SOLIDUS

This means that 0x2140 in JIS X 0208 (0xA1 0xC0 in EUC-JP) is
mapped into U+005C.

Note that EUC-JP is a CES (Character Encoding Scheme) whose
CCS (Coded Character Sets) are ASCII and JIS X 0208 (optionally
JIS X 0201 Kana and JIS X 0212).

Which should U+005C be converted into in EUC-JP, 0x5C or 0xA1 0xC0?


I sent a mail to [EMAIL PROTECTED] and I received a message "This is 
a known problem, and is very unfortunate.  We don't have an official
way around this problem."



> Better now? ;-)

A bit better.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to