On Sat, Jun 18, 2011 at 3:31 PM, Ingo Schwarze <[email protected]> wrote: > Hi Otto, > > from careful code inspection, your patch looks correct. > > Replacing the ">" by ">=" is wrong, it would breaks this test case: > > echo abc | ./obj/sed -E 's/(b|())/x/g' > > However, the code is hard to follow, and i suggest the following > refactoring such that people doing a review can more easily check > the correctness of the code after application of the patch: > > 1) X || Y || (!Y && Z) <=> X || Y || Z > so leave out the middle + line of your patch. > > 2) In case of match[0].rm_so == 0, > the cspace in the if block you touch has no effect > because copying zero bytes would be pointless. > Thus, we can pull the cspace before the main if, > optionally with an if (match[0].rm_so). > That's also much easier to understand: > *Always* copy the leading retained string, if any. > > 3) In the else clause of "Move past this match", > the if is completely pointless. > For vanishing rm_so, both clauses do the same thing. > > 4) Pull out the update of s and slen at the end. > In the if checking whether the current match is empty, > increment rm_eo exactly at the same time as copying the > additional character. > > The resulting code is much shorter and simpler, > and hence much easier to audit for correctness. > > However, i'm a bit scared about touching this code. > Breaking this would be really bad. > I'm currently running a make build with it. > > Yours, > Ingo > > >> Index: process.c >> =================================================================== >> RCS file: /cvs/src/usr.bin/sed/process.c,v >> retrieving revision 1.15 >> diff -u -p -r1.15 process.c >> --- process.c 27 Oct 2009 23:59:43 -0000 1.15 >> +++ process.c 15 Jun 2011 06:31:08 -0000 >> @@ -336,7 +336,9 @@ substitute(struct s_command *cp) >> switch (n) { >> case 0: /* Global */ >> do { >> - if (lastempty || match[0].rm_so != match[0].rm_eo) { >> + if (lastempty || match[0].rm_so != match[0].rm_eo || >> + (match[0].rm_so == match[0].rm_eo && >> + match[0].rm_so > 0)) { >> /* Locate start of replaced string. */ >> re_off = match[0].rm_so; >> /* Copy leading retained string. */ > > > Index: process.c > =================================================================== > RCS file: /cvs/src/usr.bin/sed/process.c,v > retrieving revision 1.15 > diff -u -p -r1.15 process.c > --- process.c 27 Oct 2009 23:59:43 -0000 1.15 > +++ process.c 18 Jun 2011 18:23:00 -0000 > @@ -336,31 +336,25 @@ substitute(struct s_command *cp) > switch (n) { > case 0: /* Global */ > do { > - if (lastempty || match[0].rm_so != match[0].rm_eo) { > - /* Locate start of replaced string. */ > - re_off = match[0].rm_so; > - /* Copy leading retained string. */ > - cspace(&SS, s, re_off, APPEND); > - /* Add in regular expression. */ > + /* Copy leading retained string. */ > + if (match[0].rm_so) > + cspace(&SS, s, match[0].rm_so, APPEND); > + > + /* Append replacement. */ > + if (lastempty || match[0].rm_so || > + match[0].rm_so != match[0].rm_eo) > regsub(&SS, s, cp->u.s->new); > - } > > - /* Move past this match. */ > - if (match[0].rm_so != match[0].rm_eo) { > - s += match[0].rm_eo; > - slen -= match[0].rm_eo; > - lastempty = 0; > - } else { > - if (match[0].rm_so == 0) > - cspace(&SS, s, match[0].rm_so + 1, > - APPEND); > - else > - cspace(&SS, s + match[0].rm_so, 1, > - APPEND); > - s += match[0].rm_so + 1; > - slen -= match[0].rm_so + 1; > + if (match[0].rm_so == match[0].rm_eo) { > + cspace(&SS, s + match[0].rm_eo++, 1, APPEND); > lastempty = 1; > - } > + } else > + lastempty = 0; > + > + /* Move past this match. */ > + s += match[0].rm_eo; > + slen -= match[0].rm_eo; > + > } while (slen > 0 && regexec_e(re, s, REG_NOTBOL, 0, slen)); > /* Copy trailing retained string. */ > if (slen > 0) > >
Hi Ingo untouched sed output # echo abc | sed -E 's/(b|())/x/g' xaxcx Otto patch + >=0 change sed output # echo abc | /usr/obj/usr.bin/sed/sed -E 's/(b|())/x/g' xaxxcx The second looks OK (http://www.regextester.com/, http://www.solmetra.com/scripts/regex/index.php) What do you think that is wrong? Thank you very much Regards

