Re: Unicode, character ambiguities

Edward Cherlin Fri, 11 Jan 2002 04:49:26 -0800

On Thursday 10 January 2002 02:43 pm, you wrote:
> Hi,
>
> At Thu, 10 Jan 2002 13:40:09 -0800,
>
> Edward Cherlin wrote:
> > > (Unfortunately, Microsoft Japanese fonts don't *have* a
> > > single-width backslash *at all*, which means terminal
> > > emulators--which typically don't want to deal with multiple
> > > fonts--are hard pressed to do anything like this at all. 
> > > Grrr.)
> >
> > These broken fonts must be replaced. That is the only way to get
> > proper display of Japanese using Japanese fonts in Unicode under
> > Windows. Unfortunately, they exist because some Japanese claim
> > that they are *necessary* for proper display of Japanese in
> > Unicode, which is nonsense.
>
> Yes, I think this is a mess.  However, Microsoft cannot change it.
> For example, I can write
> "the cost is \100 and the file is C:\text\abc.txt" or,


How is such code executed, then? It appears severely broken. No 
compiler can tell from this code fragment which is supposed to be 
which, since \100 is a legitimate filespec in Windows.

> printf("The cost is \\100.\n");
> Here, "\" in "\100" or "\\100" means yen sign (and you may think
> it should be mapped into U+00A5), 

Exactly. I can't think of any other possibility. Do that and the 
whole problem goes away, so we can fix the fonts. You don't need the 
fonts fixed first, so it's up to the Japanese programming community 
to get their heads on straight and face up to their responsibilities. 
Otherwise we will have to declare the Japanese reputation for quality 
a myth.

> while the codepoint of U+005C
> is proper for "\" in file name or "\n".  We cannot transcode such
> strings in automatic way. 

I doubt that, assuming that the compiler can tell what you want done. 
I think my son in college could code a Perl script for any particular 
programming language you need translated. If you need more help, I 
can get hold of some of the people who handled the date 
identification mess in all of the varieties of source code during the 
Y2K cleanup.

> Thus, Microsoft chose a way that
> "\" is mapped into U+005C and the glyph for U+005C is yen sign.
> I heard that Java also have such mapping table.

If so, they are also part of the problem, rather than part of the 
solution. 

> I agree this is a severe violation of Unicode standard but I don't
> know any clean solution.

Fixing the source code at the source is a lot cleaner than inflicting 
your "fix" on the rest of the world. It's as bad as Oracle's attempt 
to define a standard for its variant UTF-8 (CESU-8, which apparently 
should be pronounced 'sezyu' in English). Their stated reason is the 
same, that it's too much work to fix all of their databases, and 
their cure is to lay even more work off on the rest of the world.
> ---
> Tomohiro KUBOTA <[EMAIL PROTECTED]>
> http://www.debian.or.jp/~kubota/
> "Introduction to I18N" 
> http://www.debian.org/doc/manuals/intro-i18n/

-- 
Edward Cherlin
[EMAIL PROTECTED]
Does your Web site work?
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode, character ambiguities

Reply via email to