On Sat Mar 29 21:46:33 EDT 2014, [email protected] wrote:
> very good.
> 
> one question about:
> 
> -             x = re2or(x, rclass(ov, Runemask));
> +             x = re2or(x, rclass(ov, 0xffff));
> 
> this seems wrong for 21 bit runes (the old is also wrong i think).
>
> shouldnt that be:
> 
> +             x = re2or(x, rclass(ov, Runemax));
> 
> as Runemask (0x1fffff) is not a valid rune for 21-bit rune
>  as it is >Runemax.

yes, that's correct.  i left it at 0xffff because was still a bug.
tab2 still needs to burst the leading bytes so we enum all
the cases.  i think tab2 should be

Rune    tab2[] =
{
        0x003f,
        0x0fff,
        0x07ffff,
};

since the first byte of the 21-bit rune is 0b11110xxx.

what do you think?

> as i understand it, tab1[] array contains the last valid rune
> in a range of the same utf8 encoding length.
> 
> basically:
> 
> 0-07f -> 1 byte, 0x80-0x7ff -> 2 byte  ect...
> 
> so adding 0xffff is right. the next would be 0x10ffff for 21 bit
> runes but there shouldnt be any runes above 0x10ffff.
> 
> makes any sense?

since the tab1 array is bursting at byte boundaries, the next
birst is at 0x1fffff.  but that's in undefined territory.

- erik

Reply via email to