Paul Eggert wrote: > IIRC it's because a CSET matches any byte, while the corresponding > MBCSET only matches that byte if it is a single-byte character. > So for example, say "\x82\x61" is a two-byte character. The CSET "A" > will match it but the corresponding MBCSET will not. > > This can happen in the Shift-JIS encoding.
First, I also thoutht such a case. But perhaps it's no problem, because DFA will never come across CSET on second byte in Shift_JIS. "grep -i A" -> [Aa] -> CSET "grep -i $"\x82A" -> [$"\x82\x82A"$"\x82\x82"] -> \x82 A CAT \x82 \x82 CAT OR Laster will be never \x82 [A\x82] -> \x82 CSET CAT.
