On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote:
On 05/27/2016 03:39 PM, Dmitry Olshansky wrote:
On 27-May-2016 21:11, Andrei Alexandrescu wrote:
On 5/27/16 10:15 AM, Chris wrote:
It has happened to me that characters like "é" return length == 2

Would normalization make length 1? -- Andrei

No, this is not the point of normalization.

What is? -- Andrei

Here is an example about normalization.

In Unicode, the grapheme Ä is composed of two code points: A (the ascii A) and the ¨ character.

However, one of the goals of unicode was to be backwards to compatible with earlier encodings that extended ASCII (codepages).
In some codepages, Ä was an actual codepoint.

So in some cases you would have the unicode one which is two codepoints and the one from some codepages which would be one.

Those should be the same though, i.e compare the same. In order to do that, there is normalization. What is does is to _expand_ the single codepoint Ä into A + ¨


Reply via email to