Re: Order of operations should follow order of warnings in eval

Eric Blake Tue, 13 May 2025 08:04:54 -0700

On Tue, May 13, 2025 at 12:13:42AM -0400, Nikolaos Chatzikonstantinou wrote:
> I'm trying to think of what the output of this program should be:
> 
> eval(1 = 2 = 3 / 0)


The macro expansion should be empty, and at least a division by zero
warning message.  Beyond that, whether there is one or two warnings
about a deprecated use of = as ==, and in what order, is not fixed in
stone.

And you've identified something that happened to change in 1.4.20:

$ echo 'eval(1=1=1/0)dnl' | m4-1.4.19
m4:stdin:1: Warning: recommend ==, not =, for equality operator
m4:stdin:1: divide by zero in eval: 1=1=1/0
$ echo 'eval(1=1=1/0)dnl' | m4-1.4.20
m4:stdin:1: Warning: recommend ==, not =, for equality operator
m4:stdin:1: Warning: recommend ==, not =, for equality operator
m4:stdin:1: divide by zero in eval: 1=1=1/0

That change to a doubled warning on = was not deliberate, but also not
wrong.  In 1.4.19, division by zero was _supposed_ to be a late
warning (one that could be ignored if it occurs on the dead branch of
|| or &&), but was propagated incorrectly (as evidenced by the
eval(1||(1/0)) bug).  Meanwhile, if you step through the code of
1.4.19, the warning for = was produced AFTER first checking whether
the left and right side of = were both integers (and not errors).

In 1.4.20, division by zero _is_ a late warning, while = is a warning
produced the moment it is encountered, whether or not it is operating
on integers or on late warnings.  You can demonstrate this by seeing
that the warning occurs even in the dead branch of ||, but does NOT
occur if a fatal error (such as bad op) ends the parse early:

$ m4-1.4.19
eval(1 || (1=1))
m4:stdin:1: Warning: recommend ==, not =, for equality operator
1
eval(1 &= (1=1))
m4:stdin:2: invalid operator in eval: 1 &= (1=1)

$ m4-1.4.20
eval(1 || (1=1))
m4:stdin:1: Warning: recommend ==, not =, for equality operator
1
eval(1 &= (1=1))
m4:stdin:2: invalid operator in eval: 1 &= (1=1)

In branch-1.6, where = has been ripped out (well, turned into an
invalid operator like += and friends), the behavior is different again
- like other bad operators, it stops the parse immediately (whether
bad operator or missing ')' takes precedence is the subject of my
patch yesterday).  That opens up the door for an eventual m4 2.0 to
repurpose assignment operators into actually assigning values into
macro names [ie. eval(`a=1') could behave like define(`a', eval(1))].

> 
> Since it should be equal to
> 
> eval((1 = 2) = (3 / 0))
> 
> I'd assume it should warn as follows:
> 
> m4:stdin:1: Warning: recommend ==, not =, for equality operator
> m4:stdin:1: divide by zero in eval: 1 = 2 = 3 / 0
> m4:stdin:1: Warning: recommend ==, not =, for equality operator
> 
> which would be the breadth-first left-to-right evaluation strategy,
> but instead it warns like so:
> 
> m4:stdin:1: Warning: recommend ==, not =, for equality operator
> m4:stdin:1: Warning: recommend ==, not =, for equality operator
> m4:stdin:1: divide by zero in eval: 1 = 2 = 3 / 0
> 
> which is the depth-first left-to-right evaluation. Although, I'm not
> entirely sure.

Stepping through the code is the best way to see what actually
happened.  The first call to parse_expr starts with v1=1 and
consumes/peeks at <v1=1>" = 2 ="; the second = is the same priority,
so that is enough to set v1 to the result of "1=2", and that
computation is what outputs the first warning.  The loop then repeats,
and it now consumes/peeks at <v1=0>"= 3 /" which is an operator of
higher precedence, so it must recurse into a second parse_expr.  The
second parse_expr starts with v1=3 and consumes/peeks at <v1=3>" /
0\0" where end-of-string is a lower precedence operator, so it
computes the division "3/0", which results in a divide-by-zero value,
but no warning is issued yet.  Control flow then returns to the first
parse_expr, which is now sitting on <v1=0>=<v2=div0>"\0", and since
end of string is lower precedence, it is time to compute the =
operator, which outputs the second warning about = before returning a
div0 result.  Finally, the top-level evaluate() gets a result of div0,
and produces the final warning and empty output.

If we wanted, we could treat the warning for misuse of = the same as
division by 0 (something that only occurs at most once if the
expression is otherwise parseable, and can be silenced if it occurs in
the dead branch of || or &&) rather than something that occurs every
time it is encountered; and maybe even mask it out so that an eval()
produces at most one warning overall (ie. a div0 warning could silence
any = warnings).  But given that = is disappearing entirely for 1.6,
I'm not sure it is worth a 1.4.21 just for tweaks like that.  Further,
MOST m4 code won't trip the = messages (they have switched to == long
ago to silence it), and the more important fact is whether the
expansion of eval() is blank when there is a division by 0 error, with
a message explaining how eval failed, rather than if or how many other
warnings occurred alongside that error.

That said, I think it's fairly easy to change m4p to more closely
match m4 1.4.20 behavior, including better short-circuiting of || and
&&, and warnings that match m4 1.4.20 for all of eval(1%0),
eval(1=2=3/0), eval(1+=(2=3)), eval(1+=1+=1), and eval(`++a').  It
does not, however, match eval(1||(1+=1)), although I like the
resulting 'invalid operator' error of m4p with this patch better than
m4 1.4.20's 'missing right parenthesis' (and the patch I recently
posted for branch-1.6 has that nicer behavior as well).  Also,
eval(a++) reports 'bad expression (bad input)' with m4 1.4.20, but
only 'bad expression' with this patch in m4p.  It's up to you on how
hard you want to try to exactly match m4 1.4.20's behavior, short of
duplicating that parser identically rather than relying on Lark to
parse for you.  So this patch is not necessarily the best, but it's as
much as I could whip out today if you want to use parts or all of it.

My patch cheats and turns rules for invalid operators into consuming
the rest of the input (use of /.+/) as closest to what m4 1.4.20 does,
but I note that even without that cheat, your grammar isn't quite
right (in C, the += production is closer to 'assignment: atom "+="
assignment', rather than 'assignment: logical_op "+=" logical_op',
since the left hand side must be a single term and you can chain
assignments with right-associativity).

Not touched in this patch: I think there are places where you get the
wrong answer because you don't convert to c_int32 until at the end of
the evaluation, whereas m4 1.4.20 is doing ALL operations in signed
32-bit.  But I haven't yet been bothered to demonstrate such a
difference; maybe in a later email.

diff --git i/.gitignore w/.gitignore
index c14b1f1..92906e5 100644
--- i/.gitignore
+++ w/.gitignore
@@ -1,2 +1,3 @@
 *.egg-info
 __pycache__
+.venv
diff --git i/m4p/m4_eval.py w/m4p/m4_eval.py
index 6f2e565..51576b4 100644
--- i/m4p/m4_eval.py
+++ w/m4p/m4_eval.py
@@ -48,9 +48,9 @@ def radix_str(n: int, base: int) -> bytes:
 class CalcInfo(Enum):
     DEPRECATED_EQ = auto()
     DIV_BY_ZERO = auto()
+    MOD_BY_ZERO = auto()
     NEGATIVE_EXP = auto()
-    INVALID_UNARY_OP = auto()
-    INVALID_BINARY_OP = auto()
+    INVALID_OP = auto()


 class PropagatingError:
@@ -74,11 +74,7 @@ class BadInput(ParserError):
     pass


-class InvalidUnaryOperator(PropagatingError):
-    pass
-
-
-class InvalidBinaryOperator(PropagatingError):
+class InvalidOperator(PropagatingError):
     pass


@@ -86,26 +82,34 @@ class DivByZero(PropagatingError):
     pass


+class ModByZero(PropagatingError):
+    pass
+
+
 class NegExp(PropagatingError):
     pass


+class OperatorError(BaseException):
+    pass
+
+
 # For C's operator precedence see
 # <https://en.cppreference.com/w/c/language/operator_precedence>.
 calc_grammar = """
     ?start: assignment

     ?assignment: logical_or
-        | logical_or  "+=" logical_or -> invalid_binary_operator
-        | logical_or  "-=" logical_or -> invalid_binary_operator
-        | logical_or  "*=" logical_or -> invalid_binary_operator
-        | logical_or  "/=" logical_or -> invalid_binary_operator
-        | logical_or  "%=" logical_or -> invalid_binary_operator
-        | logical_or  "|=" logical_or -> invalid_binary_operator
-        | logical_or  "&=" logical_or -> invalid_binary_operator
-        | logical_or  "^=" logical_or -> invalid_binary_operator
-        | logical_or ">>=" logical_or -> invalid_binary_operator
-        | logical_or "<<=" logical_or -> invalid_binary_operator
+        | logical_or  "+=" /.+/ -> invalid_binary_operator
+        | logical_or  "-=" /.+/ -> invalid_binary_operator
+        | logical_or  "*=" /.+/ -> invalid_binary_operator
+        | logical_or  "/=" /.+/ -> invalid_binary_operator
+        | logical_or  "%=" /.+/ -> invalid_binary_operator
+        | logical_or  "|=" /.+/ -> invalid_binary_operator
+        | logical_or  "&=" /.+/ -> invalid_binary_operator
+        | logical_or  "^=" /.+/ -> invalid_binary_operator
+        | logical_or ">>=" /.+/ -> invalid_binary_operator
+        | logical_or "<<=" /.+/ -> invalid_binary_operator

     ?logical_or: logical_and
         | logical_or "||" logical_and -> logical_or
@@ -164,8 +168,8 @@ calc_grammar = """
          | "-" atom         -> neg
          | "~" atom         -> invert
          | "!" atom         -> not_
-         | "--" atom        -> invalid_unary_operator
-         | "++" atom        -> invalid_unary_operator
+         | "--" /.+/        -> invalid_unary_operator
+         | "++" /.+/        -> invalid_unary_operator
          | atom "--"        -> invalid_unary_operator
          | atom "++"        -> invalid_unary_operator
          | "(" assignment ")"
@@ -290,11 +294,11 @@ class CalculateTree(Transformer):
             return x == y

     def deprecated_eq(self, x, y):
+        self.info.append(CalcInfo.DEPRECATED_EQ)
         g = guard(x, y)
         if g:
             return g
         else:
-            self.info.append(CalcInfo.DEPRECATED_EQ)
             return x == y

     def lshift(self, x, y):
@@ -312,20 +316,10 @@ class CalculateTree(Transformer):
             return x >> (y & 0x1F)

     def invalid_unary_operator(self, x):
-        g = guard(x)
-        if g:
-            return g
-        else:
-            self.info.append(CalcInfo.INVALID_UNARY_OP)
-            return InvalidUnaryOperator()
+        raise OperatorError()

     def invalid_binary_operator(self, x, y):
-        g = guard(x, y)
-        if g:
-            return g
-        else:
-            self.info.append(CalcInfo.INVALID_BINARY_OP)
-            return InvalidBinaryOperator()
+        raise OperatorError()

     def pow(self, x, y):
         g = guard(x, y)
@@ -333,10 +327,8 @@ class CalculateTree(Transformer):
             return g
         else:
             if x == 0 and y == 0:
-                self.info.append(CalcInfo.DIV_BY_ZERO)
                 return DivByZero()
             elif y < 0:
-                self.info.append(CalcInfo.NEGATIVE_EXP)
                 return NegExp()
             else:
                 return x**y
@@ -347,8 +339,7 @@ class CalculateTree(Transformer):
             return g
         else:
             if y == 0:
-                self.info.append(CalcInfo.DIV_BY_ZERO)
-                return DivByZero()
+                return ModByZero()
             else:
                 return x % y

@@ -358,7 +349,6 @@ class CalculateTree(Transformer):
             return g
         else:
             if y == 0:
-                self.info.append(CalcInfo.DIV_BY_ZERO)
                 return DivByZero()
             else:
                 return x // y
@@ -389,20 +379,26 @@ class CalculateTree(Transformer):
         return x

     def logical_and(self, x, y):
+        g_x = guard(x)
+        if g_x:
+            return g_x
+        g_y = guard(y)
+        if g_y and x != 0:
+            return g_y
         if x == 0:
             return 0
-        if isinstance(x, DivByZero):
-            return x
-        elif isinstance(y, DivByZero):
-            return y
-        return int(x != 0 and y != 0)
+        return int(y != 0)

     def logical_or(self, x, y):
-        if isinstance(x, DivByZero):
-            return x
-        elif x == 0 and isinstance(y, DivByZero):
-            return y
-        return int(x != 0 or y != 0)
+        g_x = guard(x)
+        if g_x:
+            return g_x
+        g_y = guard(y)
+        if g_y and x == 0:
+            return g_y
+        if x != 0:
+            return 1
+        return int(y != 0)


 calc_parser = Lark(
@@ -413,7 +409,11 @@ calc_parser = Lark(
 def calc(s: bytes) -> int | PropagatingError:
     try:
         calc_parser.options.options["transformer"].reset()
+        info = calc_parser.options.options["transformer"].info
         result = calc_parser.parse(s)
+    except OperatorError as e:
+        info.append(CalcInfo.INVALID_OP)
+        return InvalidOperator()
     except ValueError:
         return BadInput()
     except UnexpectedCharacters as e:
@@ -422,6 +422,12 @@ def calc(s: bytes) -> int | PropagatingError:
         return ParserError()
     g = guard(result)
     if g:
+        if isinstance(g, DivByZero):
+            info.append(CalcInfo.DIV_BY_ZERO)
+        elif isinstance(g, ModByZero):
+            info.append(CalcInfo.MOD_BY_ZERO)
+        elif isinstance(g, NegExp):
+            info.append(CalcInfo.NEGATIVE_EXP)
         return g
     else:
         return c_int32(result).value
diff --git i/m4p/parser.py w/m4p/parser.py
index c387e1a..175df28 100644
--- i/m4p/parser.py
+++ w/m4p/parser.py
@@ -862,13 +862,17 @@ class Parser:
                     self.info(
                         pathname, line, b"divide by zero in eval: %b" % 
arg_expression
                     )
+                case CalcInfo.MOD_BY_ZERO:
+                    self.info(
+                        pathname, line, b"modulo by zero in eval: %b" % 
arg_expression
+                    )
                 case CalcInfo.NEGATIVE_EXP:
                     self.info(
                         pathname,
                         line,
                         b"negative exponent in eval: %b" % arg_expression,
                     )
-                case CalcInfo.INVALID_UNARY_OP | CalcInfo.INVALID_BINARY_OP:
+                case CalcInfo.INVALID_OP:
                     self.returncode = 1
                     self.info(
                         pathname, line, b"invalid operator in eval: %b" % 
arg_expression


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: Order of operations should follow order of warnings in eval

Reply via email to