On Thursday, April 26, 2012 13:51:17 Nick Sabalausky wrote: > Also, keep in mind that (unless I'm mistaken) walkLength does *not* return > the number of "characters" (ie, graphemes), but merely the number of code > points - which is not the same thing (due to existence of the > [confusingly-named] "combining characters").
You're not mistaken. Nothing in Phobos (save perhaps some of std.regex's internals) deals with graphemes. It all operates on code points, and strings are considered to be ranges of code points, not graphemes. So, as far as ranges go, walkLength returns the actual length of the range. That's _usually_ the number of characters/graphemes as well, but it's certainly not 100% correct. We'll need further unicode facilities in Phobos to deal with that though, and I doubt that strings will ever change to be treated as ranges of graphemes, since that would be incredibly expensive computationally. We have enough performance problems with strings as it is. What we'll probably get is extra functions to deal with normalization (and probably something to count the number of graphemes) and probably a wrapper type that does deal in graphemes. Regardless, you're right about walkLength returning the number of code points rather than graphemes, because strings are considered to be ranges of dchar. - Jonathan M Davis
