Fri, 13 Apr 2001 10:33:17 +0100, Markus Kuhn <[EMAIL PROTECTED]> wrote: > There is a bug in > http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT > that causes round-trip compatibility problems if this table is used to > convert EUC-JP into Unicode and back. > Suggested fix: Replace in JIS0208.TXT the line > 0x815F 0x2140 0x005C # REVERSE SOLIDUS > with > 0x815F 0x2140 0xFF3C # FULLWIDTH REVERSE SOLIDUS I found a same problem in http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0212.TXT . If JISX0212 is used as a coded character set(CCS) of EUC-JP, ISO-2022-JP-1, or ISO-2022-JP-2, replace in JIS0212.TXT the line 0x2237 0x007E # TILDE with 0x2237 0xFF5E # FULLWIDTH TILDE Note: * The new Japanese standard JISX0213 has also TILDE. Converters which will support JISX0213 should similarly care. * The CP932 code position 0x8160 (WAVE DASH, 0x301C of JISX0208) is converted to U+FF5E (FULLWIDTH TILDE). http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT It is completely a BUG. But, it will not be changed. ;-( When Unicode is converted to Japanese lagacy charsets which are not have JISX0212 or JISX0213, it is useful to convert U+FF5E to 0x301C of JISX0208. > I have not been able to check, what JIS X 0221-1995 says here, but I > hope that they haven't made the same mistake. These problem are not bugs, and JIS did not make mistakes. JISX0208.TXT and JISX0212.TXT are right as mapping tables of JIS family. But, it is fact that they confuse us. Mapping tables for CES (EUC-JP.TXT, ISO-2022-JP.TXT, and so on) should be prepared instead of tables for CCS (JISX0208.TXT and JISX0212.TXT). At Thu, 12 Apr 2001 22:09:19 +0900, Tomohiro KUBOTA <[EMAIL PROTECTED]> wrote: > One harm from unuseful (but compliant to principle) conversion table > (i.e., Unicode Consortium's JISX0208.TXT) is that it caused vendors > to adopt various conversion tables (for EUC-JP). Also there are > many Shift_JIS variants (even though Unicode Consortium supplies > SJIS.TXT. I don't know why. This might be historical and political > origin). It is too late to change this situation... Thus, at least > I hope free software world will have a common consistent conversion > table.... I hope too. We have to have *a common consistent* conversion table. To be *a common consistent* is more important than to be right or wrong. If there are some tables, round-trip is never guaranteed. Before Unicode, at least the round-trip about JISX0208 is completely guaranteed. We(Japanese) did not feel happy about it, because it is natural and a minimum function of all converters. After Unicode, we need it as a minimum function even if a converter is based on Unicode. Please make us unhappy... Fri, 13 Apr 2001 10:33:17 +0100, Markus Kuhn <[EMAIL PROTECTED]> wrote: > I do understand that JIS X 0201 lacks the two ASCII characters U+005C > REVERSE SOLIDUS and U+007E TILDE (places U+00A5 YEN SIGN and U+203E > OVERLINE there instead), but this simply makes JIS X 0201 unsuitable for > use on POSIX platforms and cannot be an excuse for squeezing one (then > why not both?) of these two single-width characters into the JIS X 0208 > mapping table. I agree. Microsoft's opinion is probably same, because 0x5C of CP932 (YEN SIGN) is converted to U+005C (REVERSE SOLIDUS) and 0x7E of CP932 (OVERLINE) is converted to U+007E (TILDE). === I'm sorry to say only Japanese charsets. But, I think other many charsets bave same problems. For example, CP936.TXT (CP936 is known as GBK) is not completely compatible with GB2312.TXT. http://www.unicode.org/Public/MAPPINGS/EASTASIA/GB/GB2312.TXT 0x2124 0x30FB # KATAKANA MIDDLE DOT 0x212A 0x2015 # HORIZONTAL BAR http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT 0xA1A4 0x00B7 #MIDDLE DOT 0xA1AA 0x2014 #EM DASH I don't know how do Chinese solve this problem. For another example, CP950.TXT (CP950 is a extention of Big5) has many differences with BIG5.TXT. http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT http://www.unicode.org/Public/MAPPINGS/EASTASIA/OTHER/BIG5.TXT I have heard Big5 has some variants and BIG5.TXT in Unicode.org is broken, but I don't know how is this problem solved. Thanks, ------------------------------------------- Hironori Sakamoto <[EMAIL PROTECTED]> http://www2u.biglobe.ne.jp/~hsaka/ - Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/lists/
