On 03/14/2010 02:16 AM, Norihiro Tanaka wrote:
Hi,
By this patch, even when multibyte-check failed for a simple pattern
that doesn't contain the wild-card and the repetition expression, `dfaexec'
will have called.
Do you intend it?
Yes, see for example bug 23814. There, I'm searching for \xAA\xBB;
kwset could give an exact match, but it only finds an unaligned match in
\xBB\xAA\xBB\xAA. Note that DFA search anyway runs only on the line
that kwset selected. Also, for UTF-8 the is_mb_middle test should
always lead to success unless an invalid UTF-8 character gets into the
DFA's "must" kwset.
The alternative is making kwset multibyte-aware, which is probably not
impossible but not easy either; I would know how to do it only if I
could specialize kwset with knowledge of the particular charsets, which
is not good.
Paolo