On Sat, 15 Jan 2011 14:51:47 -0500, Steven Schveighoffer
<[email protected]> wrote:
I feel like you might be exaggerating, but maybe I'm completely wrong on
this, I'm not well-versed in unicode, or even languages that require
unicode. The clear benefit I see is that with a string type which
normalizes to canonical code points, you can use this in any algorithm
without having it be unicode-aware for *most languages*. At least, that
is how I see it. I'm looking at it as a code-reuse proposition.
It's like calendars. There are quite a few different calendars in
different cultures. But most people use a Gregorian calendar. So we
have three options:
a) Use a Gregorian calendar, and leave the other calendars to a 3rd
party library
b) Use a complicated calendar system where Gregorian calendars are
treated with equal respect to all other calendars, none are the default.
c) Use a Gregorian calendar by default, but include the other calendars
as a separate module for those who wish to use them.
I'm looking at my proposal as more of a c) solution.
Can you show how normalization causes subtle bugs?
I see from Michel's post how normalization automatically can be bad. I
also see that it can be wasteful. So I've shifted my position.
Now I agree that we need a full unicode-compliant string type as the
default. See my reply to Michel for more info on my revised proposal.
-Steve