Nick Sabalausky wrote: > "Andrei Alexandrescu" <[email protected]> wrote in message > news:[email protected]... >> >> This may sometimes not be what the user expected; most of the time they'd >> care about the code points. >> > > I dunno, spir has succesfuly convinced me that most of the time it's > graphemes the user cares about, not code points. Using code points is just > as misleading as using UTF-16 code units.
I agree. This is a very informative thread, thanks spir and everybody else. Going back to the topic, it seems to me that a unicode string is a surprisingly complicated data structure that can be viewed from multiple types of ranges. In the light of this thread, a dchar doesn't seem like such a useful type anymore, it is still a low level abstraction for the purpose of correctly dealing with text. Perhaps even less useful, since it gives the illusion of correctness for those who are not in the know. The algorithms in std.string can be upgraded to work correctly with all the issues mentioned, but the generic ones in std.algorithm will just subtly do the wrong thing when presented with dchar ranges. And, as I understood it, the purpose of a VleRange was exactly to make generic algorithms just work (tm) for strings. Is it still possible to solve this problem or are we stuck with specialized string algorithms? Would it work if VleRange of string was a bidirectional range with string slices of graphemes as the ElementType and indexing with code units? Often used string algorithms could be specialized for performance, but if not, generic algorithms would still work.
