On Wednesday, 27 November 2013 at 12:46:38 UTC, bearophile wrote:
Through Reddit I have seen this small comparison of Unicode handling between different programming languages:

http://mortoray.com/2013/11/27/the-string-type-is-broken/

Most of the points are good, but the author seems to confuse UCS-2 with UTF-16, so the whole point about UTF-16 is plain wrong.

The author also doesn't seem to understand the Unicode definitions of character and grapheme, which is a shame, because the difference is more or less the whole point of the post.

D+Phobos seem to fail most things (it produces BAFFLE):
http://dpaste.dzfl.pl/a5268c435

D strings are arrays of code units and ranges of code points. The failure here is yours; in that you didn't use std.uni to handle graphemes.

On that note, I tried to use std.uni to write a simple example of how to correctly handle this in D, but it became apparent that std.uni should expose something like `byGrapheme` which lazily transforms a range of code points to a range of graphemes (probably needs a `byCodePoint` to do the converse too). The two extant grapheme functions, `decodeGrapheme` and `graphemeStride`, are *awful* for string manipulation (granted, they are probably perfect for text rendering).

Reply via email to