On Wednesday, 14 August 2013 at 02:53:43 UTC, jicman wrote:
know the exact length of the characters that I have in a char[]
variable? Thanks.
Your code looks like D1...
in D1 or D2:
import std.uni;
dstring s2 = toUTF32(str);
writeln(s2.length); // 13
in D2 you can do it a little more efficiently like this:
import std.range;
writeln(walkLength(str)); // 13
The reason it shows 39 instead of 13 is that the char[] is UTF-8,
and Chinese characters are multi-byte characters in utf-8. The
.length property gives the number elements in the array, which
are bytes in utf-8.
dstring uses UTF-32, which has a consistent size for each code
point. Which isn't technically quite the same as a character
actually, but close enough that it works here.
Bottom line though, char[] for non-English text tends to have a
longer length than you expect because a lot of characters are
multi-byte in utf8. If you use dstring, the length is more
consistent.