Rainer Deyke wrote:
So, in UTF-8, length is the number of bytes in the string and sizeof is 8 (on 32-bits systems).Charles Hixson wrote:I've read and re-read the documentation, but I can't decide whether a UTF-8 character that takes multiple bytes to express counts as one or multiple values in length and sizeof. Sizeof seems to presume that all entries are the same length, but otherwise it seems to be the property I need. (I suppose that I could just enter a string that I know is multi-byte chars, but it sure would be better if I could find out from the documentation.) I'm pretty certain that it just counts as one character for indexing, so length would almost need to also count the number of characters rather than bytes.Strings are just arrays of code units. Their length is the number of elements (i.e. code units) they contain, just like other arrays. A code point may comprise multiple code units, and a logical character may comprise multiple code points. The latter is true even with dchar/utf-32.
Jerome
--
mailto:[email protected]
http://jeberger.free.fr
Jabber: [email protected]
signature.asc
Description: OpenPGP digital signature
