On 2011-06-21 5:09 AM, Marsh Ray wrote:
There are certainly more bugs lurking where the complex rules of
international character data "collide" with password hashing. How does a
password login application work from a UTF-8 terminal (or web page) when
the host is using a single-byte code page?

C, and existing C libraries, was created when all the characters that anyone could ever want fitted in less than seven bits.

When we ran out of space, each hardware manufacturer and each programmer implemented his own incompatible solution ad hoc.

The solution, of course, is more bits. The world is now standardizing on Unicode. Anything that is more than seven bits, and less than Unicode, is asking for endless compatibility crises.

Eight bit ascii is a compatibility bug.

I once looked up the Unicode algorithm for some basic "case insensitive"
string comparison... 40 pages!

When one goes truly international, case insensitivity is an AI hard problem. Only some one with an intimate knowledge of the culture can tell you if two text strings are in some sense the same, when they are not exactly alike.

Humans are so good at judging that two things are almost the same, or very similar, that we tend to overlook small differences, where computers are incapable of noticing the similarity. This is apt to create insoluble UI issues, for example the difference egold.com (all alphabetic) and ego1d.com (letters and numbers). One has to design around such problems. Don't go there!
_______________________________________________
cryptography mailing list
[email protected]
http://lists.randombit.net/mailman/listinfo/cryptography

Reply via email to