Re: Unicode, character ambiguities

David Starner Sun, 13 Jan 2002 18:31:49 -0800

On Sun, Jan 13, 2002 at 08:43:40PM -0500, Glenn Maynard wrote:
> On Sun, Jan 13, 2002 at 06:06:11PM -0600, David Starner wrote:
> > > Because that's not portable.  Read
> > > http://www.debian.or.jp/~kubota/unicode-symbols.html.
> > 
> > I know the problem. It still doesn't mean that every file format that
> > includes Unicode should define its own solution.
> 
> So we should sit back, accept Unicode as nonportable, and provide things
> like RFC2047 so people can use other encodings?  No thanks.


Is ISO-8859-1 not portable because you can't round trip CP932 through
it? Why does CP932's lack of definition make Unicode unportable? People
already pound Unicode for compromises with older systems; one more won't
make people love it.

> And if we simply say "use UTF-8", and people use whatever translation
> tables their system happens to use, then it's a lot harder to fix things
> if and when Unicode standardizes it.  

People are going to use whatever translation tables their system happens
to use. Some systems are going to translate all strings to UTF-8 as
standard practice - Java based systems, for example, and Gnome looks
like it's heading that way. Others just aren't going to be interested in
messing around with it - ANSIToUnicode, or iconv, or whatever the
library call is already does it, why are they going to rewrite the
wheel? 

> > Yes? The main difference I see between my solution and yours, is that
> > yours introduces "intelligent" parsers into every Unicode system,
> > where's mine deals with at one place, where the conversion from
> > CP932 happens.
> 
> I'm not advocating "intelligent" parsers at all.  (In fact, all of the
> suggested solutions have their problems; I believe this particular
> suggestion has by far the most.)

What was your solution? I got that you expected systems to display the
backslash as the yen sign under certain conditions. Right?
 
> > Apparently they have a hard time coexisting - poor semantics on CP932's
> > fault, not Unicode's. I don't see transfering that bug to Unicode will
> > help things in the long run.
> 
> It doesn't matter who's "fault" it is 

Actually, it does. Part of Unicode's success is that it's a simpler
solution then dealing with dozens of charsets. If you import the bugs of
dozens of charsets into Unicode, it loses part of that. 

Yes, Unicode should offer a unified translation table. Barring that, the
tables available at http://www.w3.org/TR/japanese-xml/ could be
referenced - accepting that some systems won't or can't follow the
recommendations. But importing the quirks and problems of other charset
(seperate from those inherant in the script) into Unicode won't help
things in the long run.

-- 
David Starner - [EMAIL PROTECTED], dvdeug/jabber.com (Jabber)
Pointless website: http://dvdeug.dhis.org
When the aliens come, when the deathrays hum, when the bombers bomb,
we'll still be freakin' friends. - "Freakin' Friends"
--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Unicode, character ambiguities

Reply via email to