Hi,
At Tue, 10 Apr 2001 17:02:58 +0100,
Markus Kuhn <[EMAIL PROTECTED]> wrote:
> I knew that this would be coming. Plan B:
>
> ja.UTF-8@eucwidth
Width is not specific to EUC-JP or other EUC-based encodings.
> > More conventions will be needed because of confusing situation
> > of many conversion tables between Unicode and local encodings.
>
> Please detail. You have repeatedly mentioned problems with conversion
> tables without explaining a single one.
I see. I have not explained in detail because this problem is so
complicated that I can hardly understand enough to explain to others.
So far, I introduce some web pages written in Japanese.
(You can read tables within the pages).
http://www.asahi-net.or.jp/~hc3j-tkg/unicode/index.html
http://hp.vector.co.jp/authors/VA001240/article/ucsnote.html
http://www.autumn.org/etc/unidif.html
Note these pages don't mention width problem. The focus of these
pages is conflict between conversion tables. However, the width
problem I am thinking about is derived from the conversion table
conflict problem.
Now I found we need a document written in English to explain
this problem. Though I don't know I can do it, I will try.
> As far as I can tell, the
> relevant mapping and unihan tables on http://www.unicode.org/Public/ are
> 100% bug-free by definition, as they were used to print the Han columns
> in ISO 10646-1.
Saying about Unicode Consortium's conversion table, it is impossible
to construct round-trip compatible EUC-JP <-> UCS conversion table.
This is because
http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT
has the following line:
0x815F 0x2140 0x005C # REVERSE SOLIDUS
This means that 0x2140 in JIS X 0208 (0xA1 0xC0 in EUC-JP) is
mapped into U+005C.
Note that EUC-JP is a CES (Character Encoding Scheme) whose
CCS (Coded Character Sets) are ASCII and JIS X 0208 (optionally
JIS X 0201 Kana and JIS X 0212).
Which should U+005C be converted into in EUC-JP, 0x5C or 0xA1 0xC0?
I sent a mail to [EMAIL PROTECTED] and I received a message "This is
a known problem, and is very unfortunate. We don't have an official
way around this problem."
> Better now? ;-)
A bit better.
---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/