On Wed, Apr 02, 2025 at 05:28:40PM -0500, Eric Blake via M4-patches wrote: > While a recursive descent parser is easy to write, it involves a LOT > of function calls and boilerplate. Merely parsing "eval(1)" requires > descending through ALL 11 levels of operator precedence, only for each > layer to discover there is no operator. Better is the Pratt style of > LR(1) parsing [1], which can handle any grammar where no two > consecutive non-terminals or epsilon appear in the right side of any > rule [2]. Now, parsing is done with just two mutually recursive > functions; "eval(1)" works with just two function calls (primary() > determines the value, and parse_expr() determines no operators are > present), while more complicated expressions still produce the correct > results but with less recursion. > > While at it, I noticed that "eval(1||(1/0))" used to produce a cryptic > message: > > m4:stdin:1: bad expression in eval (excess input): 1||(1/0)
Turns out "eval(1 || 1/0/0)" also fails pre-patch. I'll adjust the wording slightly to match reality. > > despite the similar "eval(1||1/0)" suppressing that as part of > short-circuiting. It turns out that my initial implementation of > short-circuiting in 1.4.8b (back in 2007!) was never fully tested on > more complex situations. > > To test that the new implementation is indeed faster, I wrote an m4 > solution [3] to an Advent of Code challenge [4] that required > computing 2000 iterations of a 24-bit linear feedback shift register > over 2000 input values (--trace shows nearly 20 million eval calls). > On my machine, runtime with an unoptimized pre-patch m4 was at 78 > seconds, post-patch it completes in 66 seconds. > > [1] > https://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/ > [2] https://en.wikipedia.org/wiki/Operator-precedence_parser > [3] https://repo.or.cz/aoc_eblake.git/blob/1b122791d4:/2024/day22.m4 > [4] https://adventofcode.com/2024/day/22 Missing the "* src/eval.c" and similar ChangeLog blurb. Will fix. > --- > NEWS | 8 + > doc/m4.texi | 8 +- > src/eval.c | 771 ++++++++++++++++++---------------------------------- > 3 files changed, 272 insertions(+), 515 deletions(-) > > diff --git a/NEWS b/NEWS > index b3aa441c..2bac8ad5 100644 > --- a/NEWS > ** A number of portability improvements inherited from gnulib, including > the ability to perform stack overflow detection on more platforms > without linking to GNU libsigsegv. > > + > * Noteworthy changes in release 1.4.18d (2021-05-11) [beta] This cleanup is good, but unrelated, and fails 'make syntax-check'. > +++ b/src/eval.c > @@ -62,19 +62,41 @@ typedef enum eval_error > } > eval_error; > > -static eval_error logical_or_term (eval_token, int32_t *); > -static eval_error logical_and_term (eval_token, int32_t *); > -static eval_error or_term (eval_token, int32_t *); > -static eval_error xor_term (eval_token, int32_t *); > -static eval_error and_term (eval_token, int32_t *); > -static eval_error equality_term (eval_token, int32_t *); > -static eval_error cmp_term (eval_token, int32_t *); > -static eval_error shift_term (eval_token, int32_t *); > -static eval_error add_term (eval_token, int32_t *); > -static eval_error mult_term (eval_token, int32_t *); > -static eval_error exp_term (eval_token, int32_t *); > -static eval_error unary_term (eval_token, int32_t *); > -static eval_error simple_term (eval_token, int32_t *); > +/* Precedence of each operator, zero to end parse_expr(). */ > + > +static int precedence[] = { > + 0, /* ERROR */ > + 0, /* BADOP */ > + 9, /* PLUS - binary has precedence, unary consumed by primary() */ > + 9, /* MINUS - binary has precedence, unary consumed by primary() */ > + 11, /* EXPONENT - the only right-associative operator */ I just realized there's no need to have this be a separate array lookup - the precedence values can be encoded directly into enum eval_token (for example, make PLUS=90, MINUS=91, EXPONENT=110; then use et/10 for min_prec). > +/* Parse `(expr)', unary operators, and numbers. */ > +static eval_error > +primary (int32_t *v1) > +{ > + case LEFTP: > + if ((er = primary (v1)) >= SYNTAX_ERROR) > + return er; > + if ((er2 = parse_expr (v1, 1)) >= SYNTAX_ERROR) > + return er2; > + if (er == NO_ERROR && er2) > + er = er2; NO_ERROR == 0; the '&& er2' condition is pointless since assigning er=0 is safe. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org _______________________________________________ M4-patches mailing list M4-patches@gnu.org https://lists.gnu.org/mailman/listinfo/m4-patches