On Sunday, 9 March 2014 at 13:00:46 UTC, monarch_dodra wrote:
As for "the belief that iterating by code point has utility." I have to strongly disagree. Unicode is composed of codepoints, and that is what we handle. The fact that it can be be encoded and stored as UTF is implementation detail.

But you don't deal with Unicode. You deal with *text*. Unless you are implementing Unicode algorithms, code points solve nothing in the general case.

Seriously, Bearophile suggested "ABCD".sort(), and it took about 6 pages (!) for someone to point out this would be wrong.

Sorting a string has quite limited use in the general case, so I think this is another artificial example.

Even Walter pointed out that such code should work. *Maybe* it is still wrong in regards to graphemes and normalization, but at *least*, the result is not a corrupted UTF-8 stream.

I think this is no worse than putting all combining marks all clustered at the end of the string, thus attached to the last non-combining letter.

Reply via email to