On 01/12/2011 02:22 AM, Andrei Alexandrescu wrote:
IIUC, for the case of text, VLERange helps abstracting from the annoying
fact that a codepoint is encoded as a variable number of code units.
What I meant is issues like:

auto text = "a\u0302"d;
writeln(text); // "â"
auto range = VLERange(text);
// extracts characters correctly?
auto letter = range.front(); // "a" or "â"?
// case yes: compares correctly?
assert(range.front() == "â"); // fail or pass?

You should try text.front right now, you might be surprised :o).

Hum, right now incorrectly returns "a" as expected. And indeed
        assert ("â" == "a\u0302");
incorrectly fails as expected.
Both would work with legacy charsets like latin-1. This is a new issue introduced with UCS, that requires an additional level of abstraction (in addition to the one required by the distincton codepoint/codeunit!)

You may have a look at https://bitbucket.org/denispir/denispir-d/src/5ec6fe1e1065/Text.html for a rough implementation of a type that does the right thing, & at https://bitbucket.org/denispir/denispir-d/src/5ec6fe1e1065/U%20missing%20level%20of%20abstraction for a (far too long) explanation. (I have tried to mention those problems a dozen times already, but for any reason nearly everybody seem definitely deaf in front of them.)


Denis
_________________
vita es estrany
spir.wikidot.com

Reply via email to