Re: Unicode, character ambiguities

Glenn Maynard Fri, 11 Jan 2002 21:22:22 -0800

On Fri, Jan 11, 2002 at 07:45:44PM -0800, H. Peter Anvin wrote:
> > You have to assume that most Japanese systems will display \ as a Yen symbol,
> > because they wlil.
> > 
> > Now, translation tables for CP932 on these systems could translate
> > backslash and the yen symbol both to the yen symbol; that way, other
> > people would see what that user saw, and that user will get yen symbols
> > back.  But then you break round-trip; when you go back to CP932, you
> > don't know whether it was originally a backslash code or a yen code.
> > 
> 
> You don't know that in the first place, apparently; so what difference
> does it make?


If the user is on a Windows system, in CP932, then it's extremely rare
for \ to be anything but a yen symbol, even in a fully Unicode program.

However, my "originally" comment was slightly confused.  It looked like
Windows had two CP932 codepoints for the Yen symbol; actually, it has
one (0x5C), and two in Unicode (5C and A5.)  In a protocol that doesn't
treat \ as special, translating CP932 0x5C to Unicode 0x00A5 is probably
OK.  (Users operating in this codepage would have no way to input a
real backslash, but they can't do that anyway.)

This doesn't help the problem of people inputting Unicode 0x005C when
they really want a yen symbol, and they're very likely to do that on a
Japanese Windows system, where both 5C and A5 look like it, and 5C is
more widely used.  The ideal solution would be to get them to stop doing
that (and believe me, I want that as much as anyone else!)  A more pragmatic
solution is to display 5C as a yen symbol when the language is Japanese.

I agree that this has the unfortunate side effect of perpetuating the
problem, and I don't like that, either.

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode, character ambiguities

Reply via email to