Stewart Stremler wrote:

begin  quoting Andrew Lentvorski as of Tue, Oct 25, 2005 at 05:26:28PM -0700:
Stewart Stremler wrote:

I think the problem is that unicode tried to solve the wrong problem.
The real problem wasn't "how do we let everyone have single-character
glyphs", but "how do we let people write in their own language on a
computer".  Since we're ready to accept bloat at the outset, a better
approach (to my way of thinking) would be to toss out ANSI, stick with
ASCII, and redefine those ANSI characters as indicators for variable
length strings that should constitute a glyph.
That pretty much describes UTF-8. So, what is your particular beef with UTF-8?

UTF-8 doesn't use printable characters.
Well, neither does ASCII, hence why even in the ASCII days it was handy to have a shorthand for "printable characters" when building grammars and regexps. It turns out being able to support "non-printable" characters is handy.

Consequently, if I see a UTF-8 "sequence", I get a ? or an empty box,
and NO way to tell what's actually there without installing some sort
of appropriate font.  (Well, I can dump it to a file and use od...)
Actually, if your system had a font with full unicode coverage (such beasties do exist), there'd be no question mark. Furthermore, not all software displays a question mark. Some chose to render the unicode character value in octal (sometimes hex, although that is uncommon). Of course, this often turns out to be less helpful than the question mark. ;-)

UTF-8 tries to make sure that nothing not an ASCII character looks
like an ASCII character; I'm not entirely convinced that this is an important issue. Perhaps it is and I just haven't grok'd the need.
Yeah, that was actually a very deliberate decision, and if you think about it, it turns out to be very important if you are trying to make it so UTF-8 can be dropped in to software that is used to dealing with ASCII with minimal consequences. It also makes parsing latin-1 stuff far more efficient.

Heh. My issue *is* Unicode. I believe that Unicode was a solution that was arrived at early and all the brainpower was put into making it work instead of asking "is this the right thing to do?" This is often the case with smart people, I find... they *can* make it work, so they don't
stop to think about whether it's worth it.
I disagree. Completely. Unicode means that I can just have a single "String" abstraction that works across multiple human and computer languages.

UTF8 does give you that. UTF-16 (or is it UCS-16?) doesn't.

So it's not the string abstraction that's the problem, it's the encoding
of glyphs.  Wide-characters seem to be the most common implementation,
and they *suck*.
Wide characters are not the most common implementation. UTF-8 is.

While UTF-16 (or UCS-2 btw) has a different set of advantages and disadvantages than UTF-8, (as does UTF-32), I can't see how it impacts on your ability to have a "String" abstraction that works across multiple human and computer languages.

I don't disagree that everyone's glyphs should be represented. But
Unicode even compromised on that.  We have *simplified* collections of
glyphs.
Actually, they have the simplified collections and more extended collections as well. In many cases Unicode handles a broader set of glyphs than other encoding formats.

And Unicode introduces *another* problem -- the problem of too-similiar
glyphs *explodes*.  This is a security issue -- a boon to phishers all
over the world.  If I can't set my locale (or toggle my display) so that
the extended character sequences show up as non-ambiguous character
sequences, I have a problem from the whole mess from the standpoint as
a user.
Wait, up above you were claiming that any extended character sequences are presented as question marks.... that would seem to really screw a phisher if you ask me. ;-)

That said, TLS/SSL certificates are *supposed* to be managed in such a way that getting a certificate for such a domain should be impossible, and ultimately, only the certificate can be trusted. Since people don't look at the certificate, they're already taking a huge risk and exposing themselves to phishing, unicode or no.

--Chris

--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to