Charles Hixson wrote: > I've read and re-read the documentation, but I can't decide whether a > UTF-8 character that takes multiple bytes to express counts as one or > multiple values in length and sizeof. Sizeof seems to presume that all > entries are the same length, but otherwise it seems to be the property I > need. (I suppose that I could just enter a string that I know is > multi-byte chars, but it sure would be better if I could find out from > the documentation.) I'm pretty certain that it just counts as one > character for indexing, so length would almost need to also count the > number of characters rather than bytes.
Strings are just arrays of code units. Their length is the number of elements (i.e. code units) they contain, just like other arrays. A code point may comprise multiple code units, and a logical character may comprise multiple code points. The latter is true even with dchar/utf-32. -- Rainer Deyke - [email protected]
