>there are still combinations of two code points that map to only one glyph (eg >c̦
?!? what the .... Wikipedia (emphases mine): > Unicode is a computing industry standard for the *consistent* encoding, > representation, and handling of text expressed in most of the world's writing > systems. In text processing, Unicode takes the role of providing a *unique* code point—a number, not a glyph—for each *character* Whether or not it is trully consistent depends on their interpretation of "character", because this section https://en.wikipedia.org/wiki/Unicode#Ready-made_versus_composite_characters talks about "main characters" and "diacritical marks" combining to make what they call in earlier sections "abstract characters". I personally believe it's a bad approach: not only because it makes it harder for computing industry, but it is in principle inconsistent with treatment of most, if not all characters. (ex: A made of 3 bars, B of 1 bar and 2 semi-circles or partial circles...), thus `(almost) any visible character can be regarded as a combination of some small, primitive "marks" (and historically probably evolved that way).` Not a perfect standard at all. --- But my practical take-away is , still, that a character (and I mean a visible character, including example of c̦ ) on the screen is represented by 1 or, for "complex" characters, more bytes. And the caret tries to step in between those bytes; sometimes ending up in "illegal positions" and not showing up. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1952#issuecomment-421758259
