Hi all, this series is a pretty heavy refactoring of how anchors work in dfa.c. The main objective is to implement ^, $, \` and \' correctly when grep -z is in use. In particular, ^ and $ will match a newline character in the middle of a NULL-delimited sequence. This is backwards-incompatible.
It is still not ready for committing, in particular I have not yet added tests and I haven't worked out how the period character should work. However, having other people hammering on it would be very useful, for both "grep -z" and regular grep. Patch 1 fixes an unrelated bug that I reported yesterday. Patch 2 introduces symbolic values for the values of "sbit" and "d->success", and patches 3/4 use those values extensively throughout dfa.c, replacing separate variables or arguments. This is because later in the series an additional value is added. Patch 5 gives a more easily defined meaning to the context field of a DFA state, and one that I'm more comfortable with hacking on. Patches 6 and 7 are simplifications in the code. Patch 8 reimplements constraints so that I can get room for buffer constraints (\` and \'). Patch 9 renames the "newline character" concept to "buffer delimiter", since later patches modify ^ and $ to anchor against a hardcoded \n. Patches 10 and 11 introduce the new feature, respectively in the matcher and in the regex parser. Paolo Bonzini (11): dfa: fix corner case with anchors dfa: introduce contexts for the values in d->success dfa: change newline/letter to a single context value dfa: refactor common context computations dfa: change meaning of a state context dfa: remove useless check dfa: make repetitive code *really* repetitive dfa: remove redundant line constraints dfa: rename "newline" to "buffer delimiter" dfa: introduce bufdelim context dfa: introduce BEGBUF/ENDBUF NEWS | 5 + src/dfa.c | 508 ++++++++++++++++++++++++++++++-------------------- tests/spencer1.tests | 12 ++ 3 files changed, 323 insertions(+), 202 deletions(-) -- 1.7.7.1