27-Nov-2013 18:45, David Nadlinger пишет:
On Wednesday, 27 November 2013 at 12:46:38 UTC, bearophile wrote:
Through Reddit I have seen this small comparison of Unicode handling
between different programming languages:

http://mortoray.com/2013/11/27/the-string-type-is-broken/

D+Phobos seem to fail most things (it produces BAFFLE):
http://dpaste.dzfl.pl/a5268c435

If you need to perform this kind of operations on Unicode strings in D,
you can call normalize (std.uni) on the string first to make sure it is
in one of the Normalization Forms. For example, just appending
.normalize to your strings (which defaults to NFC) would make the code
produce the "expected" results.

As far as I'm aware, this behavior is the result of a deliberate
decision, as normalizing strings on the fly isn't really cheap.

It's anything but cheap.
At the minimum imagine crawling the string and issuing a table lookup per codepoint.


David


--
Dmitry Olshansky

Reply via email to