Am Fri, 10 Jan 2014 18:07:54 +0100 schrieb Jacob Carlborg <[email protected]>:
> On 2014-01-10 17:01, Marco Leise wrote: > > > Sorry, I got confused with the Unicode definitions. I see now > > that a grapheme cluster is e.g. \r\n. What I really meant is > > that Phobos needs to support graphemes. But seeing that > > monsters like this exist: n͠g, I don't even know if this is > > one character or two, but right now Phobos sees it as three > > characters. > > Thunderbird sees that as two characters. Ruby sees it as three. I think this is the (or one of the) official documents about where a "user-perceived character" ends: http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules According to this, the above n͠g is indeed defined as 2 characters. Ruby is just no better than Phobos :p »Grapheme cluster boundaries are important for collation, regular expressions, UI interactions (such as mouse selection, arrow key movement, backspacing), segmentation for vertical text, identification of boundaries for first-letter styling, and counting “character” positions within text.« -- Marco
