* On Thursday 2005-07-07 at 11:29:00 +0100, Julian Foad wrote: > Charles Levert wrote: > >Please comment. I'd really like to see this go > >in before 2.6, for the above reasons. > > I'm generally in favour of getting this in to Grep soon, unless it has the > potential to destabilise Grep by changing the behaviour in subtle ways or > creating a risk of bugs in the way we interface to it. Do you see any such > risks?
Yes, to give a prudent generic answer. But nothing specific has popped up so far. The best I can say is that the test suite passed (with the exception of the one test that failed, but that I decided needed to be changed to account for a newer more proper behavior). This is one of the best argument for having an extensive test suite, because it can give us more confidence when we need to do such moves. > > Big regex.[ch] update from latest glibc CVS libc/posix/. > > It's not appropriate to use the latest development version of some code. > Use the latest officially released version. (Maybe that's what you meant, > but it doesn't sound like it.) No, I meant latest glibc CVS. My approach is to first try to be bold, at least for testing at this point by this list's members before it even goes into our CVS, so we can clearly see what breaks. Then we can back down if we see actual problems. > > already a cpp macro (maybe even to a gcc internal). > > Does the same argument not apply to all the other __foo macros defined > there? Why? No, because at least on my system's /usr/include and /usr/lib/gcc-lib, it's the only one with such a clash between installed and glibc-internal. I checked, but I invite others with perhaps newer or different systems to check as well. If this little change makes sense, we should probably send it upstream to glibc people. > > * tests/spencer1.tests: 'a[b-a]' now has an 'Invalid range end'. > > That sentence didn't tell me what you'd changed. How about "Expect an > error ('Invalid range end') for 'a[b-a]'." Ok. > > * m4/regex.m4: Add a configure-time test of the system regex > > It's confusing to have two separate log entries for this file, one saying > that it's been replaced and the other saying that it's been changed. > Please combine them into a single entry. Ok, but I want to keep the two actions in this description because it's much simpler to track things down that way, since the file would appear to have been much more heavily modified otherwise, which it hasn't. > > for this last error to be correctly reported, number invalid > > return codes from 1 to 7, add AC_SUBST(REGEX_INCLUDE, ...), > > attempt to use RE_ICASE so compilation will fail if missing. > > That whole description is too compressed for me to be able to understand > it. Please could you show us the diff between the GnuLib version and your > version if that will help. See below. > (Most of my questions are: What is the > consequence of that new test? If the system glibc regex doesn't have what we need, then the whole test file fails and we won't use it. (Use the included one instead.) > What are "invalid return codes", and why are > you numbering them? So that by looking at config.log, one can easily assess precisely what when wrong. configure:9146: $? = 2 configure: program exited with status 2 configure: failed program was: That's a good thing. All the regex.m4 changes should probably be sent upstream to gnulib, once we agree on them. > Compilation of what will fail - do you mean Grep now > requires RE_ICASE so configure should fail if it's missing?) Yes. Not now, but soon, because we know we need it to fix some known bugs. One Red Hat patch is interesting with regards to this. The included regex was so old, they know they didn't want to use it but wanted to use their known system one (they being the ones that put it there, as a distributor). However, because --without-included-regex used to pick up the included "regex.h" anyways, which IMHO was a bug as I previously stated, they needed to patch "regex.h" but not "regex.c" as a minimal fix for them but that wasn't useful to everyone else using --with-included-regex. Now --without-included-regex will pick up the system's "regex.h", as it always should have, and not the included one. Plus we're fixing our included "regex.[ch]". ~~~~~~~~ One change below was just cosmetic (case 4), but it's more readable that way, because it visually separates the dual use of [] pairs (one m4, one regex). Please review case 1, and especially the length of the pattern which is 10 including the final "\n" (once m4-processed, this is "a[[:]:]]b\n"). Does the "\n" need to be there? Otherwise, should 9 be replaced by 10? --- m4/regex.m4.gnulib-cvs-1.38 2005-01-23 03:06:57 -0500 +++ m4/regex.m4 2005-07-06 23:47:56 -0400 @@ -1,7 +1,7 @@ #serial 22 -# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2003, 2004 Free -# Software Foundation, Inc. +# Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2003, 2004, 2005 +# Free Software Foundation, Inc. # # This file is free software; the Free Software Foundation # gives unlimited permission to copy and/or distribute it, @@ -29,7 +29,7 @@ # the following run test, then default to *not* using the included regex.c. # If cross compiling, assume the test would fail and use the included # regex.c. The first failing regular expression is from `Spencer ere - # test #75' in grep-2.3. + # test #212' in grep-2.5.1a/tests/spencer2.tests. AC_CACHE_CHECK([for working re_compile_pattern], jm_cv_func_working_re_compile_pattern, AC_TRY_RUN( @@ -42,40 +42,46 @@ static struct re_pattern_buffer regex; const char *s; struct re_registers regs; - re_set_syntax (RE_SYNTAX_POSIX_EGREP); + re_set_syntax (RE_SYNTAX_POSIX_EGREP | RE_ICASE); memset (®ex, 0, sizeof (regex)); [s = re_compile_pattern ("a[[:@:>@:]]b\n", 9, ®ex);] /* This should fail with _Invalid character class name_ error. */ if (!s) exit (1); + /* This should fail with _Invalid range end_ error. */ + memset (®ex, 0, sizeof (regex)); + [s = re_compile_pattern ("a[b-a]", 6, ®ex);] + if (!s) + exit (2); + /* This should succeed, but doesn't for e.g. glibc-2.1.3. */ memset (®ex, 0, sizeof (regex)); s = re_compile_pattern ("{1", 2, ®ex); if (s) - exit (1); + exit (3); /* The following example is derived from a problem report against gawk from Jorge Stolfi <[EMAIL PROTECTED]>. */ memset (®ex, 0, sizeof (regex)); - s = re_compile_pattern ("[[an\371]]*n", 7, ®ex); + [s = re_compile_pattern ("[an\371]*n", 7, ®ex);] if (s) - exit (1); + exit (4); /* This should match, but doesn't for e.g. glibc-2.2.1. */ if (re_match (®ex, "an", 2, 0, ®s) != 2) - exit (1); + exit (5); memset (®ex, 0, sizeof (regex)); s = re_compile_pattern ("x", 1, ®ex); if (s) - exit (1); + exit (6); /* The version of regex.c in e.g. GNU libc-2.2.93 didn't work with a negative RANGE argument. */ if (re_search (®ex, "wxy", 3, 2, -2, ®s) != 1) - exit (1); + exit (7); exit (0); } @@ -101,6 +107,7 @@ if test "$jm_with_regex" = yes; then AC_LIBOBJ(regex) gl_PREREQ_REGEX + AC_SUBST(REGEX_INCLUDE, ['-I$(top_srcdir)/lib/regex']) fi ], )