Paolo Bonzini wrote: > * src/dfa.c (SUCCEEDS_IN_CONTEXT, PREV_NEWLINE_DEPENDENT, > PREV_LETTER_DEPENDENT): Rewrite to handle all 3*3=9 possible > combinations of previous and next character contexts. > (MATCHES_NEWLINE_CONTEXT, MATCHES_LETTER_CONTEXT): Remove. > (NO_CONSTRAINT, BEGLINE_CONSTRAINT, ENDLINE_CONSTRAINT, > BEGWORD_CONSTRAINT, ENDWORD_CONSTRAINT, LIMWORD_CONSTRAINT, > NOTLIMWORD_CONSTRAINT): Switch to new encoding. > * NEWS: Document resulting bugfix. > * tests/spencer1.tests: Add regression test. ... > + grep no longer misinterprets some alternations involving anchors > + (^, $, \< \> \B, \b). For example, grep -E "(^|\B)a" no > + longer reports a match for the string "x a". > + [bug present since "the beginning"] > + ... > diff --git a/src/dfa.c b/src/dfa.c ...
Very nice. The only suggestion I might make is to adjust your one-line summary Maybe "dfa: fix a subtle constraint-encoding bug" > #define PREV_NEWLINE_DEPENDENT(constraint) \ > - (((constraint) & 0xc0) >> 2 != ((constraint) & 0x30)) > + (PREV_NEWLINE_CONSTRAINT (constraint) != PREV_OTHER_CONSTRAINT > (constraint)) > #define PREV_LETTER_DEPENDENT(constraint) \ > - (((constraint) & 0x0c) >> 2 != ((constraint) & 0x03)) > + (PREV_LETTER_CONSTRAINT (constraint) != PREV_OTHER_CONSTRAINT (constraint)) Much more readable, too. It took me a few seconds to spot that 0xc0 vs. 0x0c difference. > /* Tokens that match the empty string subject to some constraint actually > work by applying that constraint to determine what may follow them, ...
