Marco Leise <[email protected]> writes: > Am Thu, 09 Jan 2014 15:20:13 +0000 > schrieb "John Colvin" <[email protected]>: >
> The point about graphemes is good. D's functions still stop > mid-way. From UTF-8 you can iterate UTF-32 code points, but > grapheme clusters are the new characters. I.e. the basic need > to iterate Unicode _characters_ is not supported! > I cannot even come up with use cases for working with code > points and think they are a conceptual black hole. Something > carried over from a time when grapheme clusters didn't exist. Actually, you can do tons of NLP without grapheme clusters. If you're paranoid, you standardize on a specific Unicode normalization first. You can probably get a bit better results by paying attention to clusters, but I suspect it will be a marginal improvement. That said, I do agree with the OP that the string API is currently more complex to understand than I'd like. However, it's significantly easier to use than what's in standard C++ for anything beyond ascii. Jerry
