Re: Unicode and Friends (Was: JSON)

Stewart Stremler Tue, 25 Oct 2005 23:36:21 -0700

begin  quoting Andrew Lentvorski as of Tue, Oct 25, 2005 at 05:26:28PM -0700:
> Stewart Stremler wrote:
> 
> >I think the problem is that unicode tried to solve the wrong problem.
> >The real problem wasn't "how do we let everyone have single-character
> >glyphs", but "how do we let people write in their own language on a
> >computer".  Since we're ready to accept bloat at the outset, a better
> >approach (to my way of thinking) would be to toss out ANSI, stick with
> >ASCII, and redefine those ANSI characters as indicators for variable
> >length strings that should constitute a glyph.
> 
> That pretty much describes UTF-8.  So, what is your particular beef with 
> UTF-8?


UTF-8 doesn't use printable characters.

Consequently, if I see a UTF-8 "sequence", I get a ? or an empty box,
and NO way to tell what's actually there without installing some sort
of appropriate font.  (Well, I can dump it to a file and use od...)

What I *want* from my software is the ability to set a mode/locale/whatnot
that displays the leading glyph as something appropriate (say, a little x 
and a number of characters and a little arrow), followed by a sequence of
normal ASCII characters.

UTF-8 tries to make sure that nothing not an ASCII character looks
like an ASCII character; I'm not entirely convinced that this is 
an important issue.  Perhaps it is and I just haven't grok'd the need.

> >Heh. My issue *is* Unicode.  I believe that Unicode was a solution that 
> >was arrived at early and all the brainpower was put into making it work 
> >instead of asking "is this the right thing to do?"  This is often the 
> >case with smart people, I find... they *can* make it work, so they don't
> >stop to think about whether it's worth it.
> 
> I disagree.  Completely.  Unicode means that I can just have a single 
> "String" abstraction that works across multiple human and computer 
> languages.

UTF8 does give you that. UTF-16 (or is it UCS-16?) doesn't.

So it's not the string abstraction that's the problem, it's the encoding
of glyphs.  Wide-characters seem to be the most common implementation,
and they *suck*.

> The cacophony of "String" data types in various programming languages 
> and libraries prior to Unicode shows that a solution was needed.  I 
> don't see how any other solution will avoid dealing with the same issues 
> as Unicode addressed.

The cacophony of "string" data-types is not solved with Unicode; trying
to introduce unicode seems to make the cacophony *worse*.

I don't disagree that everyone's glyphs should be represented. But
Unicode even compromised on that.  We have *simplified* collections of
glyphs.

And Unicode introduces *another* problem -- the problem of too-similiar
glyphs *explodes*.  This is a security issue -- a boon to phishers all
over the world.  If I can't set my locale (or toggle my display) so that
the extended character sequences show up as non-ambiguous character
sequences, I have a problem from the whole mess from the standpoint as
a user.

-Stewart

-- 
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Re: Unicode and Friends (Was: JSON)

Reply via email to