Norihiro Tanaka wrote:
> By the way, I took into another bug by my previous patch. If
> `kwsm.index < kwset_exact_matches', don't have to run DFA for whole a buffer.
I found a issue in the patch.
If failed in DFA after succeed in kwset, doesn't return to kwset until
reaches the end of the buffer or find a match. By that, thought some
cases speed up, there is also a case slowdown.
Although A is speed-up, B is slowdown.
(A)
yes abcdabc | head -50000000 >k
env LC_ALL=C time -p src/grep abcd.bd k
(B)
yes "abcdabc
$(yes jjjjjjj | head -99)" | head -50000000 >k
env LC_ALL=C time -p src/grep abcd.bd k
In A, KWset doesn't work at all, and it's harmful.
OTOH, in B, It works effectively in A.
I considered only the case of A, but it's necessary to consider how B
does not slowdown.
I wrote the patch for the master to return to KWset, after checking with
DFA about 30 line. `30' is based on results of the tests.
However, I don't so like this patch, since the basis to 30 is weak...
Is there anyone that have any good ideas?
Norihiro
From 4d6a6029361edc2710c729fe99fb2ea56007bf84 Mon Sep 17 00:00:00 2001
From: Norihiro Tanaka <[email protected]>
Date: Sun, 27 Apr 2014 20:18:58 +0900
Subject: [PATCH] grep: ajustment of timing back to kwset when dfaisfast is
true
If failed in DFA after succeed in kwset, doesn't return to kwset until
reaches the end of the buffer or find a match. By that, thought some
cases speed up, there is also a case slowdown.
This patch ajusts timing back to kwset when dfaisfast is true.
* src/dfasearch.c (EGexecute): Do it.
---
src/dfasearch.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
diff --git a/src/dfasearch.c b/src/dfasearch.c
index 7edc96b..6167c4d 100644
--- a/src/dfasearch.c
+++ b/src/dfasearch.c
@@ -261,25 +261,38 @@ EGexecute (char const *buf, size_t size, size_t
*match_size,
if (! dfaexec (dfa, beg, (char *) end, 0, NULL, &backref))
continue;
}
+ else
+ {
+ end += (end - beg) * 30;
+ if (end < buflim)
+ {
+ end = memchr (end, eol, buflim - end);
+ end = end ? end + 1 : buflim;
+ }
+ else
+ end = buflim;
+ }
}
- if (!kwset || dfafast)
+ else
+ end = buflim;
+ if (!kwset || (dfafast && kwsm.index >= kwset_exact_matches))
{
/* No good fixed strings or DFA is fast; use DFA. */
size_t offset, count;
char const *next_beg;
count = 0;
- offset = dfahint (dfa, beg, (char *) buflim, &count);
+ offset = dfahint (dfa, beg, (char *) end, &count);
if (offset == (size_t) -1)
- goto failure;
+ continue;
if (offset == (size_t) -2)
{
/* No use hint. */
- next_beg = dfaexec (dfa, beg, (char *) buflim, 0,
+ next_beg = dfaexec (dfa, beg, (char *) end, 0,
NULL, &backref);
/* If there's no match, or if we've matched the sentinel,
we're done. */
- if (next_beg == NULL || next_beg == buflim)
- goto failure;
+ if (next_beg == NULL || next_beg == end)
+ continue;
}
else
next_beg = beg + offset;
--
1.9.2