On Sun, Nov 23, 2008 at 3:45 PM, listmember <[EMAIL PROTECTED]> wrote: > > I am referring to going to the nth character in a string. With UTF-8 it is > no more a simple arithmetic and an index operation. You have to start from > zero and iterate until you get to your characters --at every step, > calculating whether it is 2, 3 or 4 bytes long. Doing this is decompression.
Well if the string is well formed UTF-8, the first byte of each character will tell you how far to jump ahead, so you don't need to visit each byte. With UTF-16, you also can't just jump to the n'th character. It also needs special attention to check for surrogate pairs. At least the good thing of UTF-8 is that you don't have to worry about LE or BE byte orders. UTF-16 and UTF-32 have that nasty issue. Regards, - Graeme - _______________________________________________ fpGUI - a cross-platform Free Pascal GUI toolkit http://opensoft.homeip.net/fpgui/ _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel