Have a look here [1]. For example, if you have a byte that is between U+0080 and U+07FF you know that you need two bytes to get that whole code point.

[1] http://en.wikipedia.org/wiki/UTF-8#Description

Thanks. I solved it myself already for UTF-8 encoding. There choosed approach with using bitbask. Maybe it is not best with eficiency but it works)

( str[index] & 0b10000000 ) == 0 ||
( str[index] & 0b11100000 ) == 0b11000000 ||
( str[index] & 0b11110000 ) == 0b11100000 ||
( str[index] & 0b11111000 ) == 0b11110000

If it is true it means that first byte of sequence found and I can count them. Am I right that it equals to number of graphemes, or are there some exceptions from this rule?

For UTF-32 number of codeUnits is just equal to number of graphemes. And what about UTF-16? Is it possible to detect first codeUnit of encoding sequence?

Reply via email to