Op Sun, 23 Nov 2008, schreef Jonas Maebe:


On 23 Nov 2008, at 13:31, Daniël Mantione wrote:

For an IDE, this is a little bit more complicated. I.e. searching for a ç in a source file needs to find both the composed and the decomposed variant, and in the case of UTF-8, this character can be encoded in 1, 2, 3 or 4 bytes which all need to be found. This is where UTF-16 and UTF-32 start to make sense.

Characters can also be decomposed in UTF-16 and in UTF-32 (for the same reasons as in UTF-8).

I am aware of that, but the combining cedille is not in the "easy to process range" of UTF-8. In other words, you cannot do
"if char[i]=combining_cedille" in UTF-8.

Instead UTF-8, you need to make sure the string has enough characters left, and then compare multiple characters. Heck, you even need to take care of the fact the the combining cedille can be encoded in 2, 3 or 4 bytes.

Daniël
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Reply via email to