On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:
On 6/2/2016 12:34 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote:
Pretty much everything. Consider s and s1 string variables with possibly
different encodings (UTF8/UTF16).

* s.all!(c => c == 'ö') works only with autodecoding. It returns always false
without.


False. Many characters can be represented by different sequences of codepoints. For instance, ê can be ê as one codepoint or ^ as a modifier followed by e. ö is
one such character.

There are 3 levels of Unicode support. What Andrei is talking about is Level 1.

http://unicode.org/reports/tr18/tr18-5.1.html

I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness.

There are languages that make heavy use of diacritics, often several on a single "character". Hebrew is a good example. Should there be only one valid ordering of any given set of diacritics on any given character? It's an interesting idea, but it's not how things are.

Reply via email to