On Sunday, 5 October 2014 at 08:27:58 UTC, Uranuz wrote:
I have struct StringStream that I use to go through and parse input string. String could be of string, wstring or dstring type. I implement function popChar that reads codeUnit from Stream. I want to have *debug* mode of parser (via CT switch), where I could get information about lineIndex, codeUnitIndex, graphemeIndex. So I don't want to use *front* primitive because it autodecodes everywhere, but I want to get info abot index of *user perceived character* in debug mode (so decoding is needed here).

Question is how to detect that I go from one Unicode grapheme to another when iterating on string, wstring, dstring by code unit? Is it simple or is it attempt to reimplement a big piece of existing std library code?

You can use std.uni.byGrapheme to iterate by graphemes:
http://dlang.org/phobos/std_uni.html#.byGrapheme

AFAIK, graphemes are not "self synchronizing", but codepoints are. You can pop code units until you reach the beginning of a new codepoint. From there, you can iterate by graphemes, though your first grapheme might be off.

Reply via email to