Re: Unicode, character ambiguities

Glenn Maynard Sun, 13 Jan 2002 02:01:59 -0800

On Sun, Jan 13, 2002 at 03:38:55AM -0600, [EMAIL PROTECTED] wrote:
> >Now, it's not too hard for Xiph to avoid this problem, as long as they
> >define how to handle these translations.  
> 
> Why should they define it? It's at the wrong level - let the system define
> the conversion.


Because that's not portable.  Read
http://www.debian.or.jp/~kubota/unicode-symbols.html.

> >But the easy solution for
> >Ogg--0x5C to U+00A5--doesn't work for a lot of things.  I can't convert
> >everything from CP932 to standard Unicode this way; my C source
> >containing 'printf("Hi\n");' would no longer function, since the \ is
> >converted to a yen symbol.
> 
> Like anyone involved in this discussion couldn't have written code
> to convert the backslashs in C code intellegently in the time to have
> this argument. Heck, we could probably have even traced variable usages
> to find what's used as a filename argument in this time. A Excel 
> programmer could probably have the exact same thing in this time.

Then you introduce all of the complexity and unreliability of
"intelligent" parsers, instead of the simplicity of translation tables.
It also means that iconv() simply won't work for this translation.
Every application that uses iconv() would have to know data types (to
know which parsers and heuristics to use) and have a special case for
this.

This isn't about "translating CP932 to Unicode once", it's about
allowing them to coexist peacefully, letting CP932 be phased out, as is
done with every other charset.

> There is an upgrade path; intellegently convert the character. I think
> fixing the problem now is better than everyone dealing with it for the
> next 40 years.

If it was so easy to do, we wouldn't be having this discussion (nor
would any of the others who have had this discussion, so many times in
the past.)

-- 
Glenn Maynard
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode, character ambiguities

Reply via email to