Hi,

At Wed, 03 Apr 2002 12:59:01 +0100,
Markus Kuhn wrote:

> > SJIS    UCS
> > 8A43    6D77 (in CJK UNIFIED IDEOGRAPHS region)
> > EBA2    FA45 (in CJK COMPATIBILITY IDEOGRAPHS region)
> 
> > Usage of SJIS 8A43 means that the character is *not* SJIS EBA2.
> 
> What is the exact difference between these two? Are these two ideographs
> distinguished in any major Japanese dictionary? Unihan suggests that
> this is not the case. Where do these two ideographs come from? What do
> you need SJIS EBA2 for and when/why was it added?

Since I am not a person who use every Japanese people's possible
words, I cannot say strict answer.  However, I heard JIS X 0213
was developed with vast investigations with careful real usage
research.  Though I could not find a document on how it was developed
written in English, the following document in 
http://www.cse.cuhk.edu.hk/~irg/irg/N689_WG2N2095_KanjiUnified.pdf
may help you.  This is a document to ask Unicode to add 56 compatibility
characters (including U+FA45) for JIS X 0213 round-trip compatibility.
(Note that this proposal was lately revised to add 61 characters.)

  However, the practical world (in some of legal application in Japan)
  need to recognize those ideographs as a separated ideograph from the
  ideographs which are already shown in 10646 as sample glyph shape.

I imagine the "legal applcation" means that legal documents which
need to specify names of persons and places.  I cannot discuss on
this point further.  However, I can say that Japanese standard people
judged that these characters are needed for Japanese people's real
life.



> Unicode has two compatibility ideographs for U+6D77, namely U+FA45 and
> U+2F901, both of which are mapped to U+6D77 in Unicode Normalization Form KC.
> 
> FA45;CJK COMPATIBILITY IDEOGRAPH-FA45;Lo;0;L;6D77;;;;N;;;;;
> 2F901;CJK COMPATIBILITY IDEOGRAPH-2F901;Lo;0;L;6D77;;;;N;;;;;
> 
> http://www.unicode.org/unicode/reports/tr15/
> 
> Aren't these two compatibility ideographs enough to unambiguously
> preserve the SJIS glyph information that you worry about?

Oh, I didn't know that.  I think the sample glyph for U+2F901 which
you showed me satisfies SJIS 8A43.  However, I don't know what
unification rule is applied for U+2F901.

I am afraid you don't understadn what I want to mean.  All ideographs
in Unicode and other character set standards have their unification
rule, and thus, all ideographs in Unicode have their own range of
tolerable glyph variation.  Because the glyph which is shown in the
Unicode PDF files is a concrete one glyph, it cannot express the range
of tolerable glyph variation.  Since U+FA45 is commented in the Unicode
PDF file as "JIS X 0213 compatibility additions", it is guaranteed
that the unification rule of U+FA45 is exactly same as SJIS EBA2.
However, since U+2F901 has no comments, I don't know it is same
as SJIS 8A43.


> Would changing IBM's mapping table to map SJIS 8A43 -> U+2F901 fix your
> concern? It is another way of preserving round-trip compatibility, and
> after normalization (for those users who don't care about round-trip
> compatibility to SJIS), you end up with the exact same Unicode text.
> The example glyph for U+2F901 used in ISO 10646-2:2001 on page 370
> looks more similar to the glyph for U+6D77 used on page 926 of
> the Unicode 3.0 book than the glyph used on page 666 of Unicode 3.0.

I don't think changing IBM's mapping table is a good idea.  I only
wanted to say that it is impossible to define JIS X 0213 as a conversion
table for Unicode, though it is possible to build a usable approximation
conversion table between JIS X 0213 and Unicode.


> Is there something wrong with the glyph for U+6D77 used on page 926 of
> the Unicode 3.0 book? Where is the SJIS EBA2 officially defined? The
> Unicode 3.0 Shift-JIS index ends at EAA4.

The glyph is completely OK, with the viewpoint of U+6D77's unification
rule.  The glyph doesn't meet the range of tolerable glyph variation
of SJIS 8A43.

SJIS EBA2 is not Shift_JIS.  Sorry that I have omitted a part of
discussion.  I am talking about JIS X 0213.  Thus, now SJIS means
Shift_JISX0213 encoding, which is an extension of Shift_JIS with
JIS X 0213.  It is mentioned in JIS X 0213:2000 Annex 1 (informative).

http://isweb11.infoseek.co.jp/computer/wakaba/jisx0213-2000/jisx0213-2000.html
http://isweb11.infoseek.co.jp/computer/wakaba/jisx0213-2000/jisx0213-2000.u8.html

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to