Op Fri, 19 May 2017 15:17:55 +0200 schreef Anton Lindqvist <anton.lindqv...@gmail.com>:
On Fri, May 19, 2017 at 09:33:33AM -0300, Lucas Gabriel Vuotto wrote:
On 19/05/17 03:42, Anton Lindqvist wrote:
>
> +static int
> +u8len(unsigned char c)
> +{
> +  switch (c & 0xF0) {
> +  case 0xF0:
> +          return 4;
> +  case 0xE0:
> +          return 3;
> +  case 0xC0:
> +          return 2;
> +  default:
> +          return 1;
> +  }
> +}
> +

This is wrong: most codepoints in the range U+0080-U+07ff (the ones greater than U+0400) would be interpreted as being 1 character long instead of 2.

Thanks for the heads-up. Maybe a more reliable solution would be to call
mbtowc(3) repeatedly as new input arrives until it returns successfully.
Assuming the first read byte is a UTF-8 start byte.

Not needed. Only case 0xD0 is missing.

case 0xC0: case 0xD0:
 return 2;



--
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/

Reply via email to