On 28-May-2016 01:04, tsbockman wrote:
On Friday, 27 May 2016 at 20:42:13 UTC, Andrei Alexandrescu wrote:
On 05/27/2016 03:39 PM, Dmitry Olshansky wrote:
No, this is not the point of normalization.
What is? -- Andrei
1) A grapheme may include several combining characters (such as
diacritics) whose order is not supposed to be semantically significant.
Normalization sorts them in a standardized way so that string
comparisons return the expected result for graphemes which differ only
by the internal order of their constituent combining code points.
2) Some graphemes (like accented latin letters) can be represented by a
single code point OR a letter followed by a combining diacritic.
Normalization either splits them all apart (NFD), or combines them
whenever possible (NFC). Again, this is primarily intended to make
things like string comparisons work as expected, and perhaps to simplify
low-level tasks like graphical rendering of text.
Quite accurate statement of the goals. Normalization is all about having
canonical order of combining code points.
(Disclaimer: This is an oversimplification, because nothing about
Unicode is ever simple.)
--
Dmitry Olshansky