On 21/10/2016 22:16, Juha Manninen via Lazarus wrote:
UTF-16. It does not support all the complex rules of combining
CodePoints, but it apparently works well for accented characters in
western languages.


Which ones does it not support?
When I added it to SynEdit it was complete. It had all the combinings that the utf8 standard had back then. (at least that I could find in the documentation)

Of course if a new combining range is added, it will not contain it. If that is needed one needs an external (OS or otherwise) library, that can/will be updated on those occasions.

Mind "combining codepoints" have nothing to do with how many codepoints will be represented by one glyph.

"â" is one character. But it can be a single codepoint (in utf16 one code-unit or word // in utf8 several code-unit or byte), or 2 codepoints ("a" + combining "^").
"fi" are 2 chars. But the may be 2 or 1 glyph (ligature)

It is my understanding (but I do not know for sure) that in some languages (such as Arabic) certain letter combinations form a single glyph (afaik/google see https://en.wikipedia.org/wiki/Hamzah combined with a letter). Though maybe it is considered 2 glyph? I do not know Arabic at all. Also in some scripts glyphs are displayed in an order different from their occurrence in the text. All of this however has nothing to do with combining codepoints, or what counts a character.

--
_______________________________________________
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus

Reply via email to