Re: [Lazarus] Does Lazarus support a complete Unicode Component Library?

Vladimir Zhirov Sat, 01 Jan 2011 14:43:03 -0800

Juha Manninen wrote:

> So the conversion is only needed if a char inside the string
> is accessed by index?


No, the conversion is completely optional.
Summing up what was suggested, there are two ways to access character
by index in UTF-8 string:

1. Convert it to WideString/UnicodeString and use MyWideString[Index];
2. Use Utf8Copy(MyString, Index, 1);

The limitation of the first approach is that it relies on the fact that the 
character fits in 2 bytes 
(WideChar). As a result, it works wrong for characters of some languages and 
some special symbols 
(see 
http://en.wikipedia.org/wiki/Supplementary_Multilingual_Plane#Supplementary_Multilingual_Plane
for the list of them). So this approach does not support "true" unicode, but 
works in most cases.

The second approach should handle this right (provided there is no bugs).

> UTF8Encode returns UTF8String and the AnsiString parameter is
> internally typecasted to UnicodeString. How can that work?
> 
> Maybe Sven's example should use UTF8Decode.

Sure, UTF8Decode should have been used in this case.

--
_______________________________________________
Lazarus mailing list
[email protected]
http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus

Re: [Lazarus] Does Lazarus support a complete Unicode Component Library?

Reply via email to