On 09/11/2017 05:04 PM, Julian Büning wrote:
> observed behavior:
> 
> $ echo | ptx -S $ &
> [1] 1000
> $ jobs
> [1]+  Running                 echo | ptx -S $ &
> 
> expected behavior:
> 
> $ echo | ptx -S $ &
> [1] 1000
> [1]+  Done                    echo | ptx -S $
> 
> ptx does not terminate in case the specified sentence regex can be
> matched but has a match of length zero on input that is non-empty.
> 
> The following test cases show the same behavior:
> $ echo | ptx -S ^
> $ echo | ptx -S "a*"
> $ echo | ptx -S "\(\)"
> $ echo test | ptx -S "\n*"
> $ echo foo > non_empty; ptx non_empty -S $
> ...
> 
> In ptx.c, find_occurs_in_text() calls re_search() and uses the length of
> a match (which is falsely assumed to be greater than zero) to advance a
> cursor through the input. For a match length of zero, the cursor is
> never advanced.
> 
> When switching on the results of re_search(), a case 0 could be added.
> One possible fix would be to then abort with an error message.
> 
> We found this behavior in version 8.27 and can reproduce it in version
> 8.25 as well as version 8.28.
> 
> This behavior was found using Symbolic Execution techniques developed in
> the course of the SYMBIOSYS research project at COMSYS, RWTH Aachen
> University. This research is supported by the European Research Council
> (ERC) under the EU's Horizon 2020 Research and Innovation Programme
> grant agreement n. 647295 (SYMBIOSYS).

Good catch!
The attached patch fixes it; please check.

Have a nice day,
Berny
>From f082748dbdcab15de50e2e1dc26a7276aec2432c Mon Sep 17 00:00:00 2001
From: Bernhard Voelker <m...@bernhard-voelker.de>
Date: Wed, 13 Sep 2017 23:37:20 +0200
Subject: [PATCH] ptx: avoid infloop due to zero-length matches with -S regex

* src/ptx.c (find_occurs_in_text): Die with an appropriate error
diagnostic when the given regular expression returns a match of
length 0.
* tests/misc/ptx.pl (S-infloop): Add a test.
* NEWS (Bug fixes): Mention the fix.

Fixes https://bugs.gnu.org/28417 which was detected using
Symbolic Execution techniques developed in the course of the
SYMBIOSYS research project at COMSYS, RWTH Aachen University.
---
 NEWS              | 5 +++++
 src/ptx.c         | 5 +++++
 tests/misc/ptx.pl | 6 ++++++
 3 files changed, 16 insertions(+)

diff --git a/NEWS b/NEWS
index 8be22f1..ca36063 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,11 @@ GNU coreutils NEWS                                    -*- outline -*-
 
 * Noteworthy changes in release ?.? (????-??-??) [?]
 
+** Bug fixes
+
+  ptx -S no longer infloops for a pattern which returns zero-length matches.
+  [the bug dates back to the initial implementation]
+
 
 * Noteworthy changes in release 8.28 (2017-09-01) [stable]
 
diff --git a/src/ptx.c b/src/ptx.c
index 2aababf..b7aa107 100644
--- a/src/ptx.c
+++ b/src/ptx.c
@@ -818,6 +818,11 @@ find_occurs_in_text (int file_index)
           case -1:
             break;
 
+          case 0:
+            die (EXIT_FAILURE, 0,
+                 _("error: regular expression has a match of length zero: %s"),
+                 quote (context_regex.string));
+
           default:
             next_context_start = cursor + context_regs.end[0];
             break;
diff --git a/tests/misc/ptx.pl b/tests/misc/ptx.pl
index d71d065..4d4e1c7 100755
--- a/tests/misc/ptx.pl
+++ b/tests/misc/ptx.pl
@@ -40,6 +40,12 @@ my @Tests =
                               {OUT=>".xx \"\" \"\" \"foo\" \"\"\n"}],
 ["format-t", '--format=tex',  {IN=>"foo\n"},
                               {OUT=>"\\xx {}{}{foo}{}{}\n"}],
+
+# with coreutils-8.28 and earlier, the -S option would infloop with
+# matches of zero-length.
+["S-infloop", '-S ^', {IN=>"a\n"}, {EXIT=>1},
+                      {ERR_SUBST=>'s/^.*reg.*ex.*length zero.*$/regexlzero/'},
+                      {ERR=>"regexlzero\n"}],
 );
 
 @Tests = triple_test \@Tests;
-- 
2.1.4

Reply via email to