Tomohiro KUBOTA writes:

> > 3) For programs that interpret backslash as some kind of escape character
> >    and use Unicode internally but should work with text in Shift_JIS
> >    encoding, consider the multibyte character 0x5C as being the escape
> >    trigger, not [only] the Unicode character U+005C. This is already done
> >    in bash and gettext. For example, in GNU gettext, we have the code
> 
> I think interpretation of
> U+00A5 as an additional escape character doesn't always work, because
> Unicode texts don't have information on their origin (converted from
> Shift_JIS or not).

These are particular kinds of text files, which are fed to such
programs that do backslash interpretation: shell scripts, awk scripts,
gettext PO files, etc. - yes if the Yen sign should appear there it
needs to be doubled.

> If U+00A5 would always be an escape character,
> it would be harmful for many softwares.

Why is it more harmful if U+00A5 is an escape character than if U+005C
is an escape character? In both cases you just double it to get the
original character.

> I am interested in how European people succeeded to migrate from ISO 646
> variants into ISO 8859.  Yen Sign Problem is exactly a problem of ISO 646,
> because "0x5c = YEN SIGN" comes from JIS X 0201 Roman, which is Japanese
> variant of ISO 646.

For me, the migration occurred when I switched to using a different
computer with a different OS and a different character set. (From
ISO646-DE to CP437 at that time.) Few files were transported - there
is usually a lot of text files that you can just drop once in three
years. Among the remaining ones the disambiguation was usually easy,
depending on the type of file: In letters I only used umlauts and no
brackets, whereas in programs I mostly used brackets and no umlauts.
Only few programs contained both brackets and umlauts, and I had to do
the fixup by hand, usually the next time I needed the particular
program.

So it is a minor annoyance over the time of a few months, but by far
not the costs that you are estimating.

Bruno
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to