Re: Unicode handling comparison

Jakob Ovrum Wed, 27 Nov 2013 07:46:16 -0800

On Wednesday, 27 November 2013 at 12:46:38 UTC, bearophile wrote:

Through Reddit I have seen this small comparison of Unicodehandling between different programming languages:
http://mortoray.com/2013/11/27/the-string-type-is-broken/

Most of the points are good, but the author seems to confuseUCS-2 with UTF-16, so the whole point about UTF-16 is plain wrong.

The author also doesn't seem to understand the Unicodedefinitions of character and grapheme, which is a shame, becausethe difference is more or less the whole point of the post.

D+Phobos seem to fail most things (it produces BAFFLE):
http://dpaste.dzfl.pl/a5268c435

D strings are arrays of code units and ranges of code points. Thefailure here is yours; in that you didn't use std.uni to handlegraphemes.

On that note, I tried to use std.uni to write a simple example ofhow to correctly handle this in D, but it became apparent thatstd.uni should expose something like `byGrapheme` which lazilytransforms a range of code points to a range of graphemes(probably needs a `byCodePoint` to do the converse too). The twoextant grapheme functions, `decodeGrapheme` and `graphemeStride`,are *awful* for string manipulation (granted, they are probablyperfect for text rendering).

Re: Unicode handling comparison

Reply via email to