Paolo Bonzini wrote: > This patch works around some of the performance problems of multibyte grep. > The patch has been in RHEL-6 for a few months. I think it is also a > correctness patch, since grep has no way to support multi-character > collation elements. > > For UTF-8 it should trigger only in the presence of MBCSET, e.g. [a-z]. > For other character sets all brackets and `.` as well will trigger it. > > * src/dfa.c (dfaexec): Fall back to glibc for multibyte matches, > if possible.
Hi Paolo, Thank you for the patch. If this change really does fix a correctness bug, then it deserves a NEWS entry with enough detail to confirm that, and, if at all possible, a test suite addition. Similarly, if it works around a performance problem, it would help me evaluate it if you were to provide evidence. Maybe this has already been done in some RHEL-6 bugzilla, and you just forgot to include that? Finally, please include some of the above in a comment in the code. > --- > src/dfa.c | 9 +++++++++ > 1 files changed, 9 insertions(+), 0 deletions(-) > > diff --git a/src/dfa.c b/src/dfa.c > index 91124b6..3708be7 100644 > --- a/src/dfa.c > +++ b/src/dfa.c > @@ -3237,6 +3237,15 @@ dfaexec (struct dfa *d, char const *begin, char *end, > continue; > } > > + if (backref) > + { > + *backref = 1; > + free(mblen_buf); > + free(inputwcs); > + *end = saved_end; > + return (char *) p; > + } > + > /* Can match with a multibyte character (and multi character > collating element). Transition table might be updated. */ > s = transit_state(d, s, &p);