On Sat, 11 May 2002, Markus Kuhn wrote: > I have found some ways of lobbying for specific technical issues > within Microsoft and sometimes manage to get directly in contact .... > I'd be happy to add the yen/backslash issue to this list.
Exactly the same problem exists for Korean Won/backslash. KS X 1003 (ISO 646-KR) has Won at 0x5c just like JIS X 0201 (ISO 646-JP) has Yen at 0x5c. MS fonts for Korean has WonSign at U+005C instead of backslash, which really annoys MS-Windows TeX users among others. Those MS fonts have another (half-width) Korean Won sign at U+20A9 as well as full-width sign at U+FFE6. Given these, it may help you get your suggestion crossed to MS people that you you take up two problems in 'a single stroke'. (Looking into SimSun and MingLiu for TC and SC, I found that zh locales don't have this problem.) BTW, ko_KR locale definition in glibc 2.2.x has to use U+FFE6 in LC_MONETARY because KS X 1001 used in EUC-KR doesn't have a character corresponding to U+20A9. Of course, we don't have to if everybody uses UTF-8 locales exclusively. > However, I will need someone who writes me a detailed report and > analysis of this issue and presents a well-formulated case for > why current practice is wrong, what exactly should be changed, I'm sorry I can't help you much with this because I know only as much about Japanese situation as you know. However, I can give you a suggestion as to how to solve this problem for Japanese and Korean keyboards/IMEs. I'm not saying this is all that has to be done, but this can be a part of what need to be done. Japanese and Korean users have to switch between Japanese(Korean) input mode and English input mode(think of them as two keyboard groups in Xkb). In English input mode, the key labelled with 'vertical bar and Yen(Won)' should produce backslash(U+005C) whereas in Japanese(Korean) input mode it should generate Yen(U+00A5) and Won(U+20A9). Japanese and Korean keyboards for new computers should have *three* characters marked on that key (perhaps Yen and Won sign in a different color than other two characters to indicate that it can only be entered in Japanese and Korean input mode.) Japanese and Korean IME also have full-width mode in which pressing the key should produce fullwidth Yen and Won.(In this scheme, the fullwidth backslash can't be entered, but who needs it? If one really wants to enter it, one can use the codemap or something like that.) It may take some getting used on the part of Japaense and Korean users who got used to embed the directory separator between Japanese and Korean path names, but not many people do that under Windows (they just drag'n' drop, click, etc...) Now somebody might raise an objection to this because Shift_JIS and CP949 (extension of EUC-KR used in MS-Windows) don't have U+00A5 and U+20A9. They'd say that with this change, all of sudden emails and html files encoded can't include Yen and Won. For html files, this is not a valid objection because no matter what eccoding is used, one can always use NCRs to inclde any character in Unicode. Web authoring tools should take care of this problem. Simple text editors should warn users that files to be saved into Shift_JIS or CP949 include characters not representable in those encodings and they're about to be replaced with something like '\u00ac' (or \u20a9). For emails in plain text in legacy encodings, they can use the fullwidth Yen and Won (a smart email program would do that if users insist on using Shift_JIS and CP949/EUC-KR. Otherwise, it can send emails in UTF-8). As for existing web pages and documents, I don't know what's the best solution except that as time goes by people will gradually convert them as necessary. It'll be great if they go all the way to UTF-8 (or other UTF's). If not, at least they can use NCRs in html files. As others have written, this conversion needs some form of 'AI'(?), but I guess there are not many documents '0x5c' doubles as the directory separator(or escape characters) and Yen/Won. I'm not sure whether this will help the transition or not. However, as an interim measure, MS and foundries could make TTC (truetype collection) for Japanese and Korean have two variants, one with backslash at U+005C and the other with Yen/Won at U+005C. It would increase the size of TTC by ~100 bytes(well, it could be a few kBs, but it doesn't really matter because Japanese and Korean TTC's are usually well over 1MB) because two variants share all the glyphs except for the one for U+005C. Alternatively, truetype gsub(?) table entry for U+005C can be made use of. Perhaps, to 'promote' the transition, the default should be the one with backslash at U+005C and the other variant should have a special marker attached to its name (say, '$'). This convention is not my invention. '@' at the beginning of Korean font names (and I believe this is also the case of Japanese fonts) denote variants for vertical writing. I don't know whether this is just a convention used in MS-Windows or is a part of truetype or opentype spec. (the latter is not likely) Before sending this off, I'm gonna add another data point on this issue. Some Korean Unix/Linux users got used to interpret 'backslash' as Won when viewing Korean web pages because most, if not all, X11 bdf fonts (and some truetype fonts made for Unix/Linux users) have backslashes(well, in case of BDF fonts, they're iso-8859-x fonts so that they should). This is the opposite of what TeX users under MS-Windows got used to. However, this may not be the case of Japanese Unix/Linux users because there are JIS X 0201 fonts. (there's no KS X 1003 font.) Hope this helps you a little with raising the issue with MS, Jungshik -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
