On Wed, Sep 05, 2018 at 09:33:27PM +0000, aliak via Digitalmars-d wrote: [...] > The dstring is only ok because the 2 code units fit in a dchar right? > But all the other ones are as expected right?
And dstring will be wrong once you have non-precomposed diacritics and other composing sequences. > Seriously... why is it not graphemes by default for correctness > whyyyyyyy! Because grapheme decoding is SLOW, and most of the time you don't even need it anyway. SLOW as in, it will easily add a factor of 3-5 (if not worse!) to your string processing time, which will make your natively-compiled D code a laughing stock of interpreted languages like Python. It will make autodecoding look like an optimization(!). Grapheme decoding is really only necessary when (1) you're typesetting a Unicode string, and (2) you're counting the number of visual characters taken up by the string (though grapheme counting even in this case may not give you what you want, thanks to double-width characters, zero-width characters, etc. -- though it can form the basis of correct counting code). For all other cases, you really don't need grapheme decoding, and being forced to iterate over graphemes when unnecessary will add a horrible overhead, worse than autodecoding does today. // Seriously, people need to get over the fantasy that they can just use Unicode without understanding how Unicode works. Most of the time, you can get the illusion that it's working, but actually 99% of the time the code is actually wrong and will do the wrong thing when given an unexpected (but still valid) Unicode string. You can't drive without a license, and even if you try anyway, the chances of ending up in a nasty accident is pretty high. People *need* to learn how to use Unicode properly before complaining about why this or that doesn't work the way they thought it should work. T -- Gone Chopin. Bach in a minuet.