This patch works around some of the performance problems of multibyte grep.
The patch has been in RHEL-6 for a few months.  I think it is also a
correctness patch, since grep has no way to support multi-character
collation elements.

For UTF-8 it should trigger only in the presence of MBCSET, e.g. [a-z].
For other character sets all brackets and `.` as well will trigger it.

* src/dfa.c (dfaexec): Fall back to glibc for multibyte matches,
if possible.
---
 src/dfa.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/src/dfa.c b/src/dfa.c
index 91124b6..3708be7 100644
--- a/src/dfa.c
+++ b/src/dfa.c
@@ -3237,6 +3237,15 @@ dfaexec (struct dfa *d, char const *begin, char *end,
                 continue;
               }
 
+            if (backref)
+              {
+                *backref = 1;
+                free(mblen_buf);
+                free(inputwcs);
+                *end = saved_end;
+                return (char *) p;
+              }
+
             /* Can match with a multibyte character (and multi character
                collating element).  Transition table might be updated.  */
             s = transit_state(d, s, &p);
-- 
1.7.1


Reply via email to