On Wed, Apr 02, 2025 at 05:28:40PM -0500, Eric Blake via M4-patches wrote:
> While a recursive descent parser is easy to write, it involves a LOT
> of function calls and boilerplate.  Merely parsing "eval(1)" requires
> descending through ALL 11 levels of operator precedence, only for each
> layer to discover there is no operator.  Better is the Pratt style of
> LR(1) parsing [1], which can handle any grammar where no two
> consecutive non-terminals or epsilon appear in the right side of any
> rule [2].  Now, parsing is done with just two mutually recursive
> functions; "eval(1)" works with just two function calls (primary()
> determines the value, and parse_expr() determines no operators are
> present), while more complicated expressions still produce the correct
> results but with less recursion.
> 
> While at it, I noticed that "eval(1||(1/0))" used to produce a cryptic
> message:
> 
> m4:stdin:1: bad expression in eval (excess input): 1||(1/0)

Turns out "eval(1 || 1/0/0)" also fails pre-patch.  I'll adjust the
wording slightly to match reality.

> 
> despite the similar "eval(1||1/0)" suppressing that as part of
> short-circuiting.  It turns out that my initial implementation of
> short-circuiting in 1.4.8b (back in 2007!) was never fully tested on
> more complex situations.
> 
> To test that the new implementation is indeed faster, I wrote an m4
> solution [3] to an Advent of Code challenge [4] that required
> computing 2000 iterations of a 24-bit linear feedback shift register
> over 2000 input values (--trace shows nearly 20 million eval calls).
> On my machine, runtime with an unoptimized pre-patch m4 was at 78
> seconds, post-patch it completes in 66 seconds.
> 
> [1] 
> https://journal.stuffwithstuff.com/2011/03/19/pratt-parsers-expression-parsing-made-easy/
> [2] https://en.wikipedia.org/wiki/Operator-precedence_parser
> [3] https://repo.or.cz/aoc_eblake.git/blob/1b122791d4:/2024/day22.m4
> [4] https://adventofcode.com/2024/day/22

Missing the "* src/eval.c" and similar ChangeLog blurb.  Will fix.

> ---
>  NEWS        |   8 +
>  doc/m4.texi |   8 +-
>  src/eval.c  | 771 ++++++++++++++++++----------------------------------
>  3 files changed, 272 insertions(+), 515 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index b3aa441c..2bac8ad5 100644
> --- a/NEWS

>  ** A number of portability improvements inherited from gnulib, including
>     the ability to perform stack overflow detection on more platforms
>     without linking to GNU libsigsegv.
> 
> +
>  * Noteworthy changes in release 1.4.18d (2021-05-11) [beta]

This cleanup is good, but unrelated, and fails 'make syntax-check'.

> +++ b/src/eval.c
> @@ -62,19 +62,41 @@ typedef enum eval_error
>    }
>  eval_error;
> 
> -static eval_error logical_or_term (eval_token, int32_t *);
> -static eval_error logical_and_term (eval_token, int32_t *);
> -static eval_error or_term (eval_token, int32_t *);
> -static eval_error xor_term (eval_token, int32_t *);
> -static eval_error and_term (eval_token, int32_t *);
> -static eval_error equality_term (eval_token, int32_t *);
> -static eval_error cmp_term (eval_token, int32_t *);
> -static eval_error shift_term (eval_token, int32_t *);
> -static eval_error add_term (eval_token, int32_t *);
> -static eval_error mult_term (eval_token, int32_t *);
> -static eval_error exp_term (eval_token, int32_t *);
> -static eval_error unary_term (eval_token, int32_t *);
> -static eval_error simple_term (eval_token, int32_t *);
> +/* Precedence of each operator, zero to end parse_expr().  */
> +
> +static int precedence[] = {
> +  0, /* ERROR */
> +  0, /* BADOP */
> +  9, /* PLUS - binary has precedence, unary consumed by primary() */
> +  9, /* MINUS - binary has precedence, unary consumed by primary() */
> +  11, /* EXPONENT - the only right-associative operator */

I just realized there's no need to have this be a separate array
lookup - the precedence values can be encoded directly into enum
eval_token (for example, make PLUS=90, MINUS=91, EXPONENT=110; then
use et/10 for min_prec).

> +/* Parse `(expr)', unary operators, and numbers.  */
> +static eval_error
> +primary (int32_t *v1)
> +{

> +    case LEFTP:
> +      if ((er = primary (v1)) >= SYNTAX_ERROR)
> +        return er;
> +      if ((er2 = parse_expr (v1, 1)) >= SYNTAX_ERROR)
> +        return er2;
> +      if (er == NO_ERROR && er2)
> +        er = er2;

NO_ERROR == 0; the '&& er2' condition is pointless since assigning
er=0 is safe.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org


_______________________________________________
M4-patches mailing list
M4-patches@gnu.org
https://lists.gnu.org/mailman/listinfo/m4-patches

Reply via email to