Re: How to correctly deal with unicode strings?

Dicebot Wed, 27 Nov 2013 06:58:40 -0800

D strings have dual nature. They behave as arrays of code unitswhen slicing or accessing .length directly (because of O(1)guarantees for those operations) but all algorithms in standardlibrary work with them as with arrays of dchar:


import std.algorithm;
import std.range : walkLength, take;
import std.array : array;


void main(string[] args)
{
        char[] x = "noël".dup;

        assert(x.length == 6);
        assert(x.walkLength == 5); // ë is two symbols on my machine

        assert(x[0 .. 3] == "noe".dup); // Actual.
        assert(array(take(x, 4)) == "noë"d);

        x.reverse;

        assert(x == "l̈eon".dup); // Actual and correct!
}

Problem you have here is that ë can be represented as twoseparate Unicode code points despite being single drawn symbol.It has nothing to do with strings as arrays of code units, usingarray of `dchar` will result in same behavior.

Re: How to correctly deal with unicode strings?

Reply via email to