Matthew Burgess wrote: > On 19/03/2010 10:03, Jim Meyering wrote: >> We expect a reversed range to make grep give a diagnostic and exit 2. >> With glibc (--without-included-regex), you get no diagnostic and exit 1. > > OK, following the gnulib change at > http://lists.gnu.org/archive/html/bug-gnulib/2010-02/msg00006.html and > subsequent bug report at > http://sourceware.org/bugzilla/show_bug.cgi?id=11244 I've come up with > the attached patch (against the latest snapshot). It results in grep > using Glibc's regex implementation and all tests pass. > > However, I will admit (hopefully before the laughter starts!) that I'm > not an experienced C coder, so the patch might just be a fluke! I am > certainly more than happy to receive constructive criticism on its > contents or format though.
Thanks for contributing! For future reference, please see grep's README-hacking and http://git.sv.gnu.org/cgit/coreutils.git/plain/HACKING (yes, I should add that in grep's repository, too) You would notice that "configure" is not version-controlled, since it's generated, and that the pieces that contribute to its creation come almost entirely from gnulib. > diff -Naur grep-2.5.4.183-9159.orig/configure grep-2.5.4.183-9159/configure > --- grep-2.5.4.183-9159.orig/configure 2010-03-17 17:01:06.000000000 > +0000 > +++ grep-2.5.4.183-9159/configure 2010-03-19 22:13:34.000000000 +0000 > @@ -14652,7 +14652,7 @@ > return 1; > > /* Ensure that [b-a] is diagnosed as invalid. */ > - re_set_syntax (RE_SYNTAX_POSIX_EGREP); > + re_set_syntax (RE_SYNTAX_POSIX_EGREP | RE_NO_EMPTY_RANGES); That's precisely the change I made. However, the place to make it is in gnulib. Here's the discussion, and a related change: http://article.gmane.org/gmane.comp.lib.gnulib.bugs/21015 http://article.gmane.org/gmane.comp.lib.gnulib.bugs/21016 > diff -Naur grep-2.5.4.183-9159.orig/src/search.c > grep-2.5.4.183-9159/src/search.c > --- grep-2.5.4.183-9159.orig/src/search.c 2010-03-17 10:23:12.000000000 > +0000 > +++ grep-2.5.4.183-9159/src/search.c 2010-03-19 22:14:06.000000000 +0000 > @@ -368,7 +368,7 @@ > > COMPILE_FCT(Ecompile) > { > - return GEAcompile (pattern, size, RE_SYNTAX_POSIX_EGREP); > + return GEAcompile (pattern, size, RE_SYNTAX_POSIX_EGREP | > RE_NO_EMPTY_RANGES); Thanks. That's part of the patch, and it does cause "make check" to pass even with the latest from gnulib. Your change makes this command give a diagnostic and exit with status 2, as it should: $ :|./grep -E '[b-a]' ./grep: Invalid range end but not these two others (which would give no diagnostic and still exit with status 1): :|./grep '[b-a]' :|./egrep '[b-a]' So I've done three things: - add a test to provide coverage of those two other cases - update to latest gnulib, avoiding a spurious test failure - add RE_NO_EMPTY_RANGES in the other two places I'll push the following three patches shortly: >From bf498a94165496a987f3ac800d6542028ea85a2c Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Sat, 20 Mar 2010 10:52:20 +0100 Subject: [PATCH 1/3] reject reversed-endpoint ranges, with all regex variants * src/search.c: Add RE_NO_EMPTY_RANGES to the syntax bits in three places, so that all of grep, egrep, and grep -E reject a range with reversed endpoints like '[b-a]'. This is required, when using the latest version of gnulib's regex module, since it now honors the RE_NO_EMPTY_RANGES flag, rather than acting as if it were always set. Based on a change by Matthew Burgess. --- src/search.c | 8 +++++--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/src/search.c b/src/search.c index c986d48..f2d9ed6 100644 --- a/src/search.c +++ b/src/search.c @@ -260,7 +260,7 @@ is_mb_middle(const char **good, const char *buf, const char *end) #ifdef EGREP_PROGRAM COMPILE_FCT(Ecompile) { - reg_syntax_t syntax_bits = RE_SYNTAX_POSIX_EGREP; + reg_syntax_t syntax_bits = RE_SYNTAX_POSIX_EGREP | RE_NO_EMPTY_RANGES; #else /* No __VA_ARGS__ in C89. So we have to do it this way. */ static COMPILE_RET @@ -358,7 +358,9 @@ GEAcompile (char const *pattern, size_t size, reg_syntax_t syntax_bits) COMPILE_FCT(Gcompile) { return GEAcompile (pattern, size, - RE_SYNTAX_GREP | RE_HAT_LISTS_NOT_NEWLINE); + (RE_SYNTAX_GREP + | RE_HAT_LISTS_NOT_NEWLINE + | RE_NO_EMPTY_RANGES)); } COMPILE_FCT(Acompile) @@ -368,7 +370,7 @@ COMPILE_FCT(Acompile) COMPILE_FCT(Ecompile) { - return GEAcompile (pattern, size, RE_SYNTAX_POSIX_EGREP); + return GEAcompile (pattern, size, RE_SYNTAX_POSIX_EGREP | RE_NO_EMPTY_RANGES); } #endif /* !EGREP_PROGRAM */ -- 1.7.0.2.455.g91132 >From 812f1dbb1b54a63eff06aaea1d2038f31eb9ae0e Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Sat, 20 Mar 2010 09:40:30 +0100 Subject: [PATCH 2/3] build: update gnulib submodule to latest This pulls in the latest regex module from gnulib, including a fix to make it honor the RE_NO_EMPTY_RANGES syntax bit. tests: temporarily disable irrelevant-to-grep failing C++ fcntl-h-tests * bootstrap.conf (gnulib_tool_option_extras): Temporarily add --avoid=fcntl-h-tests, until the C++ part of that test is fixed. --- bootstrap.conf | 1 + build-aux/.gitignore | 1 + gnulib | 2 +- 3 files changed, 3 insertions(+), 1 deletions(-) diff --git a/bootstrap.conf b/bootstrap.conf index 0f38fcb..2170e19 100644 --- a/bootstrap.conf +++ b/bootstrap.conf @@ -124,6 +124,7 @@ fi test -f ChangeLog || touch ChangeLog || exit 1 gnulib_tool_option_extras="--tests-base=$bt/gnulib-tests --with-tests" +gnulib_tool_option_extras="$gnulib_tool_option_extras --avoid=fcntl-h-tests" # Build prerequisites buildreq="\ diff --git a/build-aux/.gitignore b/build-aux/.gitignore index ea3ae09..38b748c 100644 --- a/build-aux/.gitignore +++ b/build-aux/.gitignore @@ -1,5 +1,6 @@ announce-gen arg-nonnull.h +c++defs.h config.guess config.rpath config.sub diff --git a/gnulib b/gnulib index aadf332..602e3e6 160000 --- a/gnulib +++ b/gnulib @@ -1 +1 @@ -Subproject commit aadf332a5c5f209affd38c35b4e9faaa8a0adecb +Subproject commit 602e3e6b709592f883ebb7bf58df1f955ea4b8f2 -- 1.7.0.2.455.g91132 >From 159c02545be1bd4342d27c7ea5b9d06459d3c8aa Mon Sep 17 00:00:00 2001 From: Jim Meyering <[email protected]> Date: Sat, 20 Mar 2010 10:21:32 +0100 Subject: [PATCH 3/3] tests: ensure that all programs handle [b-a] consistently * tests/reversed-range-endpoints: New test. * tests/Makefile.am (TESTS): Add it. --- tests/Makefile.am | 1 + tests/reversed-range-endpoints | 20 ++++++++++++++++++++ 2 files changed, 21 insertions(+), 0 deletions(-) create mode 100644 tests/reversed-range-endpoints diff --git a/tests/Makefile.am b/tests/Makefile.am index d16cc21..67763b2 100644 --- a/tests/Makefile.am +++ b/tests/Makefile.am @@ -36,6 +36,7 @@ TESTS = \ options.sh \ pcre.sh \ pcre-z \ + reversed-range-endpoints \ spencer1.sh \ spencer1-locale \ status.sh \ diff --git a/tests/reversed-range-endpoints b/tests/reversed-range-endpoints new file mode 100644 index 0000000..e80c07a --- /dev/null +++ b/tests/reversed-range-endpoints @@ -0,0 +1,20 @@ +#!/bin/sh +# Ensure that an invalid range like [b-a] evokes exit status of 2. +: ${srcdir=.} +. "$srcdir/init.sh"; path_prepend_ ../src + +fail=0 + +printf 'Invalid range end\n' > exp +for prog in grep egrep 'grep -E'; do + $prog '[b-a]' < /dev/null > out 2>&1 + # exit status must be 2, not 1 + test $? = 2 || fail=1 + + # Remove "program_name: " prefix from actual output. + sed 's/^[a-z]*: //' out > k && mv k out + + compare out exp || fail=1 +done + +Exit $fail -- 1.7.0.2.455.g91132
