On Sat, 15 Jan 2011 14:51:47 -0500, Steven Schveighoffer <[email protected]> wrote:

I feel like you might be exaggerating, but maybe I'm completely wrong on this, I'm not well-versed in unicode, or even languages that require unicode. The clear benefit I see is that with a string type which normalizes to canonical code points, you can use this in any algorithm without having it be unicode-aware for *most languages*. At least, that is how I see it. I'm looking at it as a code-reuse proposition.

It's like calendars. There are quite a few different calendars in different cultures. But most people use a Gregorian calendar. So we have three options:

a) Use a Gregorian calendar, and leave the other calendars to a 3rd party library b) Use a complicated calendar system where Gregorian calendars are treated with equal respect to all other calendars, none are the default. c) Use a Gregorian calendar by default, but include the other calendars as a separate module for those who wish to use them.

I'm looking at my proposal as more of a c) solution.

Can you show how normalization causes subtle bugs?

I see from Michel's post how normalization automatically can be bad. I also see that it can be wasteful. So I've shifted my position.

Now I agree that we need a full unicode-compliant string type as the default. See my reply to Michel for more info on my revised proposal.

-Steve

Reply via email to