begin quoting Andrew Lentvorski as of Tue, Oct 25, 2005 at 05:26:28PM -0700: > Stewart Stremler wrote: > > >I think the problem is that unicode tried to solve the wrong problem. > >The real problem wasn't "how do we let everyone have single-character > >glyphs", but "how do we let people write in their own language on a > >computer". Since we're ready to accept bloat at the outset, a better > >approach (to my way of thinking) would be to toss out ANSI, stick with > >ASCII, and redefine those ANSI characters as indicators for variable > >length strings that should constitute a glyph. > > That pretty much describes UTF-8. So, what is your particular beef with > UTF-8?
UTF-8 doesn't use printable characters. Consequently, if I see a UTF-8 "sequence", I get a ? or an empty box, and NO way to tell what's actually there without installing some sort of appropriate font. (Well, I can dump it to a file and use od...) What I *want* from my software is the ability to set a mode/locale/whatnot that displays the leading glyph as something appropriate (say, a little x and a number of characters and a little arrow), followed by a sequence of normal ASCII characters. UTF-8 tries to make sure that nothing not an ASCII character looks like an ASCII character; I'm not entirely convinced that this is an important issue. Perhaps it is and I just haven't grok'd the need. > >Heh. My issue *is* Unicode. I believe that Unicode was a solution that > >was arrived at early and all the brainpower was put into making it work > >instead of asking "is this the right thing to do?" This is often the > >case with smart people, I find... they *can* make it work, so they don't > >stop to think about whether it's worth it. > > I disagree. Completely. Unicode means that I can just have a single > "String" abstraction that works across multiple human and computer > languages. UTF8 does give you that. UTF-16 (or is it UCS-16?) doesn't. So it's not the string abstraction that's the problem, it's the encoding of glyphs. Wide-characters seem to be the most common implementation, and they *suck*. > The cacophony of "String" data types in various programming languages > and libraries prior to Unicode shows that a solution was needed. I > don't see how any other solution will avoid dealing with the same issues > as Unicode addressed. The cacophony of "string" data-types is not solved with Unicode; trying to introduce unicode seems to make the cacophony *worse*. I don't disagree that everyone's glyphs should be represented. But Unicode even compromised on that. We have *simplified* collections of glyphs. And Unicode introduces *another* problem -- the problem of too-similiar glyphs *explodes*. This is a security issue -- a boon to phishers all over the world. If I can't set my locale (or toggle my display) so that the extended character sequences show up as non-ambiguous character sequences, I have a problem from the whole mess from the standpoint as a user. -Stewart -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg
