On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:
On 6/2/2016 12:34 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu
wrote:
Pretty much everything. Consider s and s1 string variables
with possibly
different encodings (UTF8/UTF16).
* s.all!(c => c == 'ö') works only with autodecoding. It
returns always false
without.
False. Many characters can be represented by different
sequences of codepoints.
For instance, ê can be ê as one codepoint or ^ as a modifier
followed by e. ö is
one such character.
There are 3 levels of Unicode support. What Andrei is talking
about is Level 1.
http://unicode.org/reports/tr18/tr18-5.1.html
I wonder what rationale there is for Unicode to have two
different sequences of codepoints be treated as the same. It's
madness.
There are languages that make heavy use of diacritics, often
several on a single "character". Hebrew is a good example. Should
there be only one valid ordering of any given set of diacritics
on any given character? It's an interesting idea, but it's not
how things are.