Tomohiro KUBOTA writes: > > 3) For programs that interpret backslash as some kind of escape character > > and use Unicode internally but should work with text in Shift_JIS > > encoding, consider the multibyte character 0x5C as being the escape > > trigger, not [only] the Unicode character U+005C. This is already done > > in bash and gettext. For example, in GNU gettext, we have the code > > I think interpretation of > U+00A5 as an additional escape character doesn't always work, because > Unicode texts don't have information on their origin (converted from > Shift_JIS or not).
These are particular kinds of text files, which are fed to such programs that do backslash interpretation: shell scripts, awk scripts, gettext PO files, etc. - yes if the Yen sign should appear there it needs to be doubled. > If U+00A5 would always be an escape character, > it would be harmful for many softwares. Why is it more harmful if U+00A5 is an escape character than if U+005C is an escape character? In both cases you just double it to get the original character. > I am interested in how European people succeeded to migrate from ISO 646 > variants into ISO 8859. Yen Sign Problem is exactly a problem of ISO 646, > because "0x5c = YEN SIGN" comes from JIS X 0201 Roman, which is Japanese > variant of ISO 646. For me, the migration occurred when I switched to using a different computer with a different OS and a different character set. (From ISO646-DE to CP437 at that time.) Few files were transported - there is usually a lot of text files that you can just drop once in three years. Among the remaining ones the disambiguation was usually easy, depending on the type of file: In letters I only used umlauts and no brackets, whereas in programs I mostly used brackets and no umlauts. Only few programs contained both brackets and umlauts, and I had to do the fixup by hand, usually the next time I needed the particular program. So it is a minor annoyance over the time of a few months, but by far not the costs that you are estimating. Bruno -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
