Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Michel Fortin Sat, 15 Jan 2011 20:45:26 -0800

On 2011-01-15 18:59:27 -0500, Andrei Alexandrescu<[email protected]> said:

I'm unclear on where this is converging to. At this point thecommitment of the language and its standard library to (a) UTF arayrepresentation and (b) code points conceptualization is quite strong.Changing that would be quite difficult and disruptive, and the benefitsare virtually nonexistent for most of D's user base.

There's still a disagreement about whether a string or a code unitarray should be the default string representation, and whetheriterating on a code unit array should give you code unit or graphemeelements. Of those who who participated in the discussion, I don'tthink anyone is disputing the idea that a grapheme element is betterthan a dchar element for iterating over a string.

It may be more realistic to consider using what we have as back-end forgrapheme-oriented processing.
For example:

struct Grapheme(Char) if (isSomeChar!Char)
{
     private const Char[] rep;
     ...
}

auto byGrapheme(S)(S s) if (isSomeString!S)
{
    ...
}

string s = "Hello";
foreach (g; byGrapheme(s)
{
     ...
}

No doubt it's easier to implement it that way. The problem is that inmost cases it won't be used. How many people really know what is agrapheme? Of those, how many will forget to use byGrapheme at one timeor another? And so in most programs string manipulation will misbehavein the presence of combining characters or unnormalized strings.

If you want to help D programmers write correct code when it comes toUnicode manipulation, you need to help them iterate on real characters(graphemes), and you need the algorithms to apply to real characters(graphemes), not the approximation of a Unicode character that is acode point.



--
Michel Fortin
[email protected]
http://michelf.com/

Re: VLERange: a range in between BidirectionalRange and RandomAccessRange

Reply via email to