Preparing for the release of grep-2.27, I was surprised to find inputs that triggered an infinite loop.
With yesterday's pre-release snapshot (http://meyering.net/grep/grep-2.26.37-03b4.tar.xz), this would not terminate: echo cx | LC_ALL=C src/grep -E 'c\b[x ]' I traced it to a recent change in gnulib's dfa.c and then banged my head against that code until I understood enough to fix it. I've attached the gnulib bug fix and the grep patch that adds a test case and pulls in the gnulib fix. I'll make a new snapshot later today.
From ca3ca77fc9e497ff5048a9f1ee0d1acc8e050f51 Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@fb.com> Date: Tue, 29 Nov 2016 10:45:46 -0800 Subject: [PATCH] dfa: avoid new infinite loop This would infloop: echo cx | LC_ALL=C grep -E 'c\b[x ]' * lib/dfa.c (dfastate): When constructing a new state table, we could initially declare that we had found a match, and later find that constraints eliminate that possibility, yet continue to use the now stale "matched" indicator. That would lead to an infinite loop. The solution is to update "matched" when necessary. Introduced by commit v0.1-983-g403adf1. --- ChangeLog | 11 +++++++++++ lib/dfa.c | 6 ++++++ 2 files changed, 17 insertions(+) diff --git a/ChangeLog b/ChangeLog index fd062ae..3b18083 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,14 @@ +2016-11-29 Jim Meyering <meyer...@fb.com> + + dfa: avoid new infinite loop + This would infloop: echo cx | LC_ALL=C grep -E 'c\b[x ]' + * lib/dfa.c (dfastate): When constructing a new state table, we could + initially declare that we had found a match, and later find that + constraints eliminate that possibility, yet continue to use the + now stale "matched" indicator. That would lead to an infinite loop. + The solution is to update "matched" when necessary. + Introduced by commit v0.1-983-g403adf1. + 2016-11-27 Norihiro Tanaka <nori...@kcn.ne.jp> dfa: avoid match middle in multibyte character diff --git a/lib/dfa.c b/lib/dfa.c index 673ef95..0412b2c 100644 --- a/lib/dfa.c +++ b/lib/dfa.c @@ -2609,6 +2609,12 @@ dfastate (state_num s, struct dfa *d, unsigned char uc, state_num trans[]) continue; if (j == CHARCLASS_WORDS) continue; + + /* If we have reset the bit that made us declare "matched", reset + that indicator, too. This is required to avoid an infinite loop + with this command: echo cx | LC_ALL=C grep -E 'c\b[x ]' */ + if (!tstbit (uc, matches)) + matched = false; } #ifdef DEBUG -- 2.9.3
From ae3f57c99c3c7b89cc15448a3365e71795a1a21b Mon Sep 17 00:00:00 2001 From: Jim Meyering <meyer...@fb.com> Date: Tue, 29 Nov 2016 10:55:30 -0800 Subject: [PATCH] grep: fix DFA-induced infloop * gnulib: Update to latest, for the DFA infloop fix. * tests/dfa-infloop: New test, to trigger an infinite loop in the DFA matcher. * tests/Makefile.am (TESTS): Add it. --- gnulib | 2 +- tests/Makefile.am | 1 + tests/dfa-infloop | 12 ++++++++++++ 3 files changed, 14 insertions(+), 1 deletion(-) create mode 100755 tests/dfa-infloop diff --git a/gnulib b/gnulib index 9cba42f..ca3ca77 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit 9cba42f87e1e88ac746e2341c51e78f9f640fefa +Subproject commit ca3ca77fc9e497ff5048a9f1ee0d1acc8e050f51 diff --git a/tests/Makefile.am b/tests/Makefile.am index 442e85a..3ded7a7 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -85,6 +85,7 @@ TESTS = \ count-newline \ dfa-coverage \ dfa-heap-overrun \ + dfa-infloop \ dfaexec-multibyte \ empty \ empty-line \ diff --git a/tests/dfa-infloop b/tests/dfa-infloop new file mode 100755 index 0000000..e35eef5 --- /dev/null +++ b/tests/dfa-infloop @@ -0,0 +1,12 @@ +#!/bin/sh +# This would infloop for some unreleased versions between 2.26 and 2.27 +. "${srcdir=.}/init.sh"; path_prepend_ ../src + +require_timeout_ + +fail=0 + +echo cx > in || framework_failure_ +returns_ 1 timeout 10 env LC_ALL=C grep -E 'c\b[x ]' in || fail=1 + +Exit $fail -- 2.9.3