On 01/12/2011 02:22 AM, Andrei Alexandrescu wrote:
IIUC, for the case of text, VLERange helps abstracting from the annoying
fact that a codepoint is encoded as a variable number of code units.
What I meant is issues like:
auto text = "a\u0302"d;
writeln(text); // "â"
auto range = VLERange(text);
// extracts characters correctly?
auto letter = range.front(); // "a" or "â"?
// case yes: compares correctly?
assert(range.front() == "â"); // fail or pass?
You should try text.front right now, you might be surprised :o).
Hum, right now incorrectly returns "a" as expected. And indeed
assert ("â" == "a\u0302");
incorrectly fails as expected.
Both would work with legacy charsets like latin-1. This is a new issue
introduced with UCS, that requires an additional level of abstraction
(in addition to the one required by the distincton codepoint/codeunit!)
You may have a look at
https://bitbucket.org/denispir/denispir-d/src/5ec6fe1e1065/Text.html for
a rough implementation of a type that does the right thing, & at
https://bitbucket.org/denispir/denispir-d/src/5ec6fe1e1065/U%20missing%20level%20of%20abstraction
for a (far too long) explanation.
(I have tried to mention those problems a dozen times already, but for
any reason nearly everybody seem definitely deaf in front of them.)
Denis
_________________
vita es estrany
spir.wikidot.com