Re: How to detect start of Unicode symbol and count amount of graphemes

Uranuz via Digitalmars-d-learn Mon, 06 Oct 2014 10:31:10 -0700

Have a look here [1]. For example, if you have a byte that isbetween U+0080 and U+07FF you know that you need two bytes toget that whole code point.
[1] http://en.wikipedia.org/wiki/UTF-8#Description

Thanks. I solved it myself already for UTF-8 encoding. Therechoosed approach with using bitbask. Maybe it is not best witheficiency but it works)


( str[index] & 0b10000000 ) == 0 ||
( str[index] & 0b11100000 ) == 0b11000000 ||
( str[index] & 0b11110000 ) == 0b11100000 ||
( str[index] & 0b11111000 ) == 0b11110000

If it is true it means that first byte of sequence found and Ican count them. Am I right that it equals to number of graphemes,or are there some exceptions from this rule?

For UTF-32 number of codeUnits is just equal to number ofgraphemes. And what about UTF-16? Is it possible to detect firstcodeUnit of encoding sequence?

Re: How to detect start of Unicode symbol and count amount of graphemes

Reply via email to