On Fri, 16 Oct 2015 14:33:03 +0100 Martin Frb <[email protected]> wrote:
> On 16/10/2015 10:19, Tony Whyman wrote: > > > > In terms of "work", if I use functions such as UTF8Length and > > ValidUTF8String on a GB18030 string should they always work, or are > > there exceptions? > > IIRC ... UTF8Length counts codepoints, not chars. So if the chars you > are interested in have chars that need more than one codepoint then this > is not the length in char. True. > This can even happen with some western languages, but it is not likely > with them. Actually decomposed characters are pretty common in western languages, for example on OS X HFS+. And afaik Chinese in Unicode usually use precomposed characters, does it not? > The same is for char accessing function (NextUtf8CharByteLen or > similar). They only get codepoints. Mattias -- _______________________________________________ Lazarus mailing list [email protected] http://lists.lazarus.freepascal.org/mailman/listinfo/lazarus
