Currently, when matched with multiple lines by DFA superset, return to
KWset.  However, it won't be wrong probably, because if matches with
multiple lines by DFA superset, also matches with single line there with
high probability.  Further more, if return to KWset after matched with
multiple line by DFA superset, dfafast won't work effectively.

This patch changes to retry DFA superset immediately after matched with
multiple lines by it.

I confirmed the patch by following tests.

  $ yes abcdabc | head -50000000 >k
  $ env LC_ALL=C time -p src/grep '\(ab\)cd\1d' k

  before:
    real 3.48       user 3.41       sys 0.06

  after:
    real 2.14       user 2.07       sys 0.06

Norihiro
From fafb93db6c618e69ded15317bd953a98463d200f Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <[email protected]>
Date: Fri, 9 May 2014 15:26:38 +0900
Subject: [PATCH] grep: retry DFA superset after matched with multiple lines by
 it

* src/dfasearch.c (EGexecute): Do it.
---
 src/dfasearch.c | 32 +++++++++++++++++++-------------
 1 file changed, 19 insertions(+), 13 deletions(-)

diff --git a/src/dfasearch.c b/src/dfasearch.c
index 4202666..9fb7449 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -284,26 +284,32 @@ EGexecute (char const *buf, size_t size, size_t 
*match_size,
           /* Try matching with the superset of DFA, if it's defined.  */
           if (superset && !exact_kwset_match)
             {
-              next_beg = dfaexec (superset, dfa_beg, (char *) end, 1,
-                                  &count, NULL);
-              /* If there's no match, or if we've matched the sentinel,
-                 we're done.  */
-              if (next_beg == NULL || next_beg == end)
-                continue;
-
-              /* Narrow down to the line we've found.  */
-              if (count != 0)
+              while (true)
                 {
+                  next_beg = dfaexec (superset, dfa_beg, (char *) end, 1,
+                                      &count, NULL);
+                  /* If there's no match, or if we've matched the sentinel,
+                     we're done.  */
+                  if (next_beg == NULL || next_beg == end)
+                    break;
+
+                  if (count == 0)
+                    break;
+                  count = 0;
+
                   /* If dfaexec may match in multiple lines, try to
                      match in one line.  */
-                  end = memrchr (buf, eol, next_beg - buf);
-                  end++;
-                  continue;
+                  beg = memrchr (buf, eol, next_beg - buf);
+                  beg = beg ? beg + 1 : buf;
+                  dfa_beg = beg;
                 }
+              if (next_beg == NULL || next_beg == end)
+                continue;
+
+              /* Narrow down to the line we've found.  */
               end = memchr (next_beg, eol, buflim - next_beg);
               end = end ? end + 1 : buflim;
             }
-
           /* Try matching with DFA.  */
           next_beg = dfaexec (dfa, dfa_beg, (char *) end, 0, &count, &backref);
 
-- 
1.9.2

Reply via email to