On Sat Mar 29 21:46:33 EDT 2014, [email protected] wrote:
> very good.
>
> one question about:
>
> - x = re2or(x, rclass(ov, Runemask));
> + x = re2or(x, rclass(ov, 0xffff));
>
> this seems wrong for 21 bit runes (the old is also wrong i think).
>
> shouldnt that be:
>
> + x = re2or(x, rclass(ov, Runemax));
>
> as Runemask (0x1fffff) is not a valid rune for 21-bit rune
> as it is >Runemax.
yes, that's correct. i left it at 0xffff because was still a bug.
tab2 still needs to burst the leading bytes so we enum all
the cases. i think tab2 should be
Rune tab2[] =
{
0x003f,
0x0fff,
0x07ffff,
};
since the first byte of the 21-bit rune is 0b11110xxx.
what do you think?
> as i understand it, tab1[] array contains the last valid rune
> in a range of the same utf8 encoding length.
>
> basically:
>
> 0-07f -> 1 byte, 0x80-0x7ff -> 2 byte ect...
>
> so adding 0xffff is right. the next would be 0x10ffff for 21 bit
> runes but there shouldnt be any runes above 0x10ffff.
>
> makes any sense?
since the tab1 array is bursting at byte boundaries, the next
birst is at 0x1fffff. but that's in undefined territory.
- erik