Re: The Case Against Autodecode

Minas Mina via Digitalmars-d Fri, 27 May 2016 15:17:01 -0700

On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote:

On 05/27/2016 03:39 PM, Dmitry Olshansky wrote:

On 27-May-2016 21:11, Andrei Alexandrescu wrote:

On 5/27/16 10:15 AM, Chris wrote:
It has happened to me that characters like "é" return length== 2
Would normalization make length 1? -- Andrei


No, this is not the point of normalization.


What is? -- Andrei


Here is an example about normalization.

In Unicode, the grapheme Ä is composed of two code points: A (theascii A) and the ¨ character.

However, one of the goals of unicode was to be backwards tocompatible with earlier encodings that extended ASCII (codepages).

In some codepages, Ä was an actual codepoint.

So in some cases you would have the unicode one which is twocodepoints and the one from some codepages which would be one.

Those should be the same though, i.e compare the same. In orderto do that, there is normalization. What is does is to _expand_the single codepoint Ä into A + ¨

Re: The Case Against Autodecode

Reply via email to