On 21/10/2016 22:16, Juha Manninen via Lazarus wrote:
UTF-16. It does not support all the complex rules of combining
CodePoints, but it apparently works well for accented characters in
western languages.
Which ones does it not support?
When I added it to SynEdit it was complete. It had all the combinings
that the utf8 standard had back then. (at least that I could find in the
documentation)
Of course if a new combining range is added, it will not contain it. If
that is needed one needs an external (OS or otherwise) library, that
can/will be updated on those occasions.
Mind "combining codepoints" have nothing to do with how many codepoints
will be represented by one glyph.
"â" is one character. But it can be a single codepoint (in utf16 one
code-unit or word // in utf8 several code-unit or byte), or 2 codepoints
("a" + combining "^").
"fi" are 2 chars. But the may be 2 or 1 glyph (ligature)
It is my understanding (but I do not know for sure) that in some
languages (such as Arabic) certain letter combinations form a single
glyph (afaik/google see https://en.wikipedia.org/wiki/Hamzah combined
with a letter). Though maybe it is considered 2 glyph? I do not know
Arabic at all.
Also in some scripts glyphs are displayed in an order different from
their occurrence in the text.
All of this however has nothing to do with combining codepoints, or what
counts a character.
--
_______________________________________________
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
http://lists.lazarus-ide.org/listinfo/lazarus