This patch works around some of the performance problems of multibyte grep. The patch has been in RHEL-6 for a few months. I think it is also a correctness patch, since grep has no way to support multi-character collation elements.
For UTF-8 it should trigger only in the presence of MBCSET, e.g. [a-z]. For other character sets all brackets and `.` as well will trigger it. * src/dfa.c (dfaexec): Fall back to glibc for multibyte matches, if possible. --- src/dfa.c | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/src/dfa.c b/src/dfa.c index 91124b6..3708be7 100644 --- a/src/dfa.c +++ b/src/dfa.c @@ -3237,6 +3237,15 @@ dfaexec (struct dfa *d, char const *begin, char *end, continue; } + if (backref) + { + *backref = 1; + free(mblen_buf); + free(inputwcs); + *end = saved_end; + return (char *) p; + } + /* Can match with a multibyte character (and multi character collating element). Transition table might be updated. */ s = transit_state(d, s, &p); -- 1.7.1