>there are still combinations of two code points that map to only one glyph (eg 
>c̦ 

?!? what the ....
Wikipedia (emphases mine):
> Unicode is a computing industry standard for the *consistent* encoding, 
> representation, and handling of text expressed in most of the world's writing 
> systems.
In text processing, Unicode takes the role of providing a *unique* code point—a 
number, not a glyph—for each *character*

Whether or not it is trully consistent depends on their interpretation of 
"character", because this section  
https://en.wikipedia.org/wiki/Unicode#Ready-made_versus_composite_characters 
talks about  "main characters" and "diacritical marks" combining to make what 
they call in earlier sections "abstract characters".

I personally believe it's a bad approach: not only because it makes it harder 
for computing industry, but it is in principle inconsistent with treatment of 
most, if not all characters. (ex: A made of 3 bars, B of 1 bar and 2 
semi-circles or partial circles...), thus
`(almost) any visible character can be regarded as a combination of some small, 
primitive "marks" (and historically probably evolved that way).`

Not a perfect standard at all.  

---
But my practical take-away is , still, that a character (and I mean a visible 
character, including example of c̦ ) on the screen is represented by 1 or, for 
"complex" characters, more bytes. 
And the caret tries to step in between those bytes; sometimes ending up in 
"illegal positions" and not showing up.



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/1952#issuecomment-421758259

Reply via email to