On 16/08/2017 13:37, Alexey via Lazarus wrote:
On 16.08.2017 15:30, Martin Frb via Lazarus wrote:

A char can be composed of several combining code points (each of them afaik, in the 32 bit range). So a char can have 96 or more bits. (And not all of them have a combined form).

See my prev post: i see that each S[i] good to be like QWord (sizeof(one char)= sizeof(Qword)). It can be TextChar. And type can be TextString. internally it can be compressed to utf8. TextString is good if i want to parse text by "chars". If "char" needs more bytes- lets take more (internally it is same utf8)


Have a look at https://www.reddit.com/r/Unicode/comments/4yie0a/tallest_longest_unicode_character/

There is ONE character, that comprises more than 200 codepoints.
Only way to store such a char is in a type of dynamic size (aka string)

Well I couldn't find an official doc what makes the boundaries of a char.

But as far as I can see: if รค is one character, and it can be encoded as "none combining codepoint" + "combining codepoint", then a character is any sequence of one "none combining codepoint" + zero or more "combining codepoints" (AFAIK Arabic scripts has chars, that have several "combining codepoints", so this is happening in actual languages.

The example as far as I checked fulfils this definition.

--
_______________________________________________
Lazarus mailing list
Lazarus@lists.lazarus-ide.org
https://lists.lazarus-ide.org/listinfo/lazarus

Reply via email to