Re: [fpc-devel] Memory consumed by strings

Graeme Geldenhuys Sun, 23 Nov 2008 09:31:34 -0800

On Sun, Nov 23, 2008 at 3:45 PM, listmember <[EMAIL PROTECTED]> wrote:
>
> I am referring to going to the nth character in a string. With UTF-8 it is
> no more a simple arithmetic and an index operation. You have to start from
> zero and iterate until you get to your characters --at every step,
> calculating whether it is 2, 3 or 4 bytes long. Doing this is decompression.


Well if the string is well formed UTF-8, the first byte of each
character will tell you how far to jump ahead, so you don't need to
visit each byte.

With UTF-16, you also can't just jump to the n'th character. It also
needs special attention to check for surrogate pairs.

At least the good thing of UTF-8 is that you don't have to worry about
LE or BE byte orders. UTF-16 and UTF-32 have that nasty issue.


Regards,
  - Graeme -


_______________________________________________
fpGUI - a cross-platform Free Pascal GUI toolkit
http://opensoft.homeip.net/fpgui/
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Memory consumed by strings

Reply via email to