Am 21.08.2012 09:55, schrieb Graeme Geldenhuys:
On 21 August 2012 07:10, Ivanko B<ivankob4m...@gmail.com> wrote:
How about supporting in the RTL all versions of UCS-2& UTF-16 (for
fast per-char access etc optimizations) and UTF-8 (for unlimited
number of alphabets) ?
All "access a char by index into a string" code I have seen, 99.99% of
the time work in a sequential manner. For that reason there is no
speed difference between using a UTF-16 or UTF-8 encoded string. Both
can be coded equally efficient.
Graeme, this is simply not true. Searching for known German characters
in a UnicodeString the program can use the simple approach by character
(code unit) index. It is even possible for known Chinese symbols of the
BMP. And a simple "if" for surrogate pairs is more efficent as a 4-stage
"case" for utf-8.
Martin
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel