Re: [fpc-devel] Memory consumed by strings

Daniël Mantione Sun, 23 Nov 2008 04:50:03 -0800


Op Sun, 23 Nov 2008, schreef Jonas Maebe:

On 23 Nov 2008, at 13:31, Daniël Mantione wrote:
For an IDE, this is a little bit more complicated. I.e. searching for a çin a source file needs to find both the composed and the decomposedvariant, and in the case of UTF-8, this character can be encoded in 1, 2, 3or 4 bytes which all need to be found. This is where UTF-16 and UTF-32start to make sense.
Characters can also be decomposed in UTF-16 and in UTF-32 (for the samereasons as in UTF-8).

I am aware of that, but the combining cedille is not in the "easy toprocess range" of UTF-8. In other words, you cannot do

"if char[i]=combining_cedille" in UTF-8.

Instead UTF-8, you need to make sure the string has enough charactersleft, and then compare multiple characters. Heck, you even need to takecare of the fact the the combining cedille can be encoded in 2, 3 or 4bytes.


Daniël

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

Reply via email to