Re: [fpc-devel] for-in-index loop

Daniël Mantione Fri, 25 Jan 2013 02:39:23 -0800


Op Fri, 25 Jan 2013, schreef Michael Schnell:

On 01/25/2013 11:12 AM, Michael Van Canneyt wrote:
Pchar ?
You seem to miss my point: the n'th printable character in an utf-8 codedstring (may same be stored as a pchar or a string) starts at the m'th byte(m>=n).
To find m for a given n you need to scan all bytes < m.

Thus a loop such as

for I = 1 to 100000 do begin
 n = Integer (random(100000));
 c = myString[n];
end;

Is rather fast with ANSI coded Strings.
When myString is coded in utf-8, it obviously provides silly code byteinstead of printable characters, and replacing the term myString[n] by astraight forward function searching for the n'th printable character will bevery slow.

Yes, it is a known fact that this is a weakness of UTF-8. Considertransforming the string to UTF-16, UTF-32 or even an internaldatastructure before doing the random access.

Random access inside UTF-8 is an algorithmic time complexity issue. Alanguage extension can only be a band-aid for that.


Daniël

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] for-in-index loop

Reply via email to