Hi, At Wed, 03 Apr 2002 12:59:01 +0100, Markus Kuhn wrote:
> > SJIS UCS > > 8A43 6D77 (in CJK UNIFIED IDEOGRAPHS region) > > EBA2 FA45 (in CJK COMPATIBILITY IDEOGRAPHS region) > > > Usage of SJIS 8A43 means that the character is *not* SJIS EBA2. > > What is the exact difference between these two? Are these two ideographs > distinguished in any major Japanese dictionary? Unihan suggests that > this is not the case. Where do these two ideographs come from? What do > you need SJIS EBA2 for and when/why was it added? Since I am not a person who use every Japanese people's possible words, I cannot say strict answer. However, I heard JIS X 0213 was developed with vast investigations with careful real usage research. Though I could not find a document on how it was developed written in English, the following document in http://www.cse.cuhk.edu.hk/~irg/irg/N689_WG2N2095_KanjiUnified.pdf may help you. This is a document to ask Unicode to add 56 compatibility characters (including U+FA45) for JIS X 0213 round-trip compatibility. (Note that this proposal was lately revised to add 61 characters.) However, the practical world (in some of legal application in Japan) need to recognize those ideographs as a separated ideograph from the ideographs which are already shown in 10646 as sample glyph shape. I imagine the "legal applcation" means that legal documents which need to specify names of persons and places. I cannot discuss on this point further. However, I can say that Japanese standard people judged that these characters are needed for Japanese people's real life. > Unicode has two compatibility ideographs for U+6D77, namely U+FA45 and > U+2F901, both of which are mapped to U+6D77 in Unicode Normalization Form KC. > > FA45;CJK COMPATIBILITY IDEOGRAPH-FA45;Lo;0;L;6D77;;;;N;;;;; > 2F901;CJK COMPATIBILITY IDEOGRAPH-2F901;Lo;0;L;6D77;;;;N;;;;; > > http://www.unicode.org/unicode/reports/tr15/ > > Aren't these two compatibility ideographs enough to unambiguously > preserve the SJIS glyph information that you worry about? Oh, I didn't know that. I think the sample glyph for U+2F901 which you showed me satisfies SJIS 8A43. However, I don't know what unification rule is applied for U+2F901. I am afraid you don't understadn what I want to mean. All ideographs in Unicode and other character set standards have their unification rule, and thus, all ideographs in Unicode have their own range of tolerable glyph variation. Because the glyph which is shown in the Unicode PDF files is a concrete one glyph, it cannot express the range of tolerable glyph variation. Since U+FA45 is commented in the Unicode PDF file as "JIS X 0213 compatibility additions", it is guaranteed that the unification rule of U+FA45 is exactly same as SJIS EBA2. However, since U+2F901 has no comments, I don't know it is same as SJIS 8A43. > Would changing IBM's mapping table to map SJIS 8A43 -> U+2F901 fix your > concern? It is another way of preserving round-trip compatibility, and > after normalization (for those users who don't care about round-trip > compatibility to SJIS), you end up with the exact same Unicode text. > The example glyph for U+2F901 used in ISO 10646-2:2001 on page 370 > looks more similar to the glyph for U+6D77 used on page 926 of > the Unicode 3.0 book than the glyph used on page 666 of Unicode 3.0. I don't think changing IBM's mapping table is a good idea. I only wanted to say that it is impossible to define JIS X 0213 as a conversion table for Unicode, though it is possible to build a usable approximation conversion table between JIS X 0213 and Unicode. > Is there something wrong with the glyph for U+6D77 used on page 926 of > the Unicode 3.0 book? Where is the SJIS EBA2 officially defined? The > Unicode 3.0 Shift-JIS index ends at EAA4. The glyph is completely OK, with the viewpoint of U+6D77's unification rule. The glyph doesn't meet the range of tolerable glyph variation of SJIS 8A43. SJIS EBA2 is not Shift_JIS. Sorry that I have omitted a part of discussion. I am talking about JIS X 0213. Thus, now SJIS means Shift_JISX0213 encoding, which is an extension of Shift_JIS with JIS X 0213. It is mentioned in JIS X 0213:2000 Annex 1 (informative). http://isweb11.infoseek.co.jp/computer/wakaba/jisx0213-2000/jisx0213-2000.html http://isweb11.infoseek.co.jp/computer/wakaba/jisx0213-2000/jisx0213-2000.u8.html --- Tomohiro KUBOTA <[EMAIL PROTECTED]> http://www.debian.or.jp/~kubota/ "Introduction to I18N" http://www.debian.org/doc/manuals/intro-i18n/ -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
