thanks eric. that fixed problems of my sample data! Kenji Arisawa
2014/03/30 8:54、erik quanstrom <[email protected]> のメール: >> Hello, >> >> I found a strange bug in grep. >> some Japanese runes does not match ‘[^0-9]’. >> >> for example ‘ま' (307e) and ‘み’(307f). >> > > i can't replicate here with 9atom's fixes to grep. > with the same t3 file as you've got, > > ; wc -l /tmp/t3 > 21 /tmp/t3 > ; grep -v '^[0-9]' /tmp/t3 | wc -l > 21 > > i have some other differences in grep, including -I (same > as -i, except fold runes), but i think the differences in > comp.c are what cause the bug. in particular, you really > need that 0xffff entry in the tabs. > > /n/sources/plan9/sys/src/cmd/grep/comp.c:135,145 - comp.c:135,147 > { > 0x007f, > 0x07ff, > + 0xffff, > }; > Rune tab2[] = > { > 0x003f, > 0x0fff, > + 0xffff, > }; > > Re2 > > the additional pairs and the correction to the combining case > here were not accepted to sources, but they allow for large character > classes generated used by folding. many of the characters are contiguous > so getting the contiguous case right is important. > > /n/sources/plan9/sys/src/cmd/grep/comp.c:215,221 - comp.c:217,223 > Re2 > re2class(char *s) > { > - Rune pairs[200+2], *p, *q, ov; > + Rune pairs[400+2], *p, *q, ov; > int nc; > Re2 x; > > /n/sources/plan9/sys/src/cmd/grep/comp.c:234,240 - comp.c:236,242 > break; > p[1] = *p; > p += 2; > - if(p >= pairs + nelem(pairs) - 2) > + if(p == pairs + nelem(pairs) - 2) > error("class too big"); > s += chartorune(p, s); > if(*p != '-') > /n/sources/plan9/sys/src/cmd/grep/comp.c:254,260 - comp.c:256,262 > for(p=pairs+2; *p; p+=2) { > if(p[0] > p[1]) > continue; > - if(p[0] > q[1] || p[1] < q[0]) { > + if(p[0] > q[1]+1 || p[1] < q[0]) { > q[2] = p[0]; > q[3] = p[1]; > q += 2; > > i believe this case is also critical. split the bmp off. > > /n/sources/plan9/sys/src/cmd/grep/comp.c:275,281 - comp.c:277,283 > x = re2or(x, rclass(ov, p[0]-1)); > ov = p[1]+1; > } > - x = re2or(x, rclass(ov, Runemask)); > + x = re2or(x, rclass(ov, 0xffff)); > } else { > x = rclass(p[0], p[1]); > for(p+=2; *p; p+=2) > > - erik >
