On Tue, May 13, 2025 at 12:13:42AM -0400, Nikolaos Chatzikonstantinou wrote: > I'm trying to think of what the output of this program should be: > > eval(1 = 2 = 3 / 0)
The macro expansion should be empty, and at least a division by zero warning message. Beyond that, whether there is one or two warnings about a deprecated use of = as ==, and in what order, is not fixed in stone. And you've identified something that happened to change in 1.4.20: $ echo 'eval(1=1=1/0)dnl' | m4-1.4.19 m4:stdin:1: Warning: recommend ==, not =, for equality operator m4:stdin:1: divide by zero in eval: 1=1=1/0 $ echo 'eval(1=1=1/0)dnl' | m4-1.4.20 m4:stdin:1: Warning: recommend ==, not =, for equality operator m4:stdin:1: Warning: recommend ==, not =, for equality operator m4:stdin:1: divide by zero in eval: 1=1=1/0 That change to a doubled warning on = was not deliberate, but also not wrong. In 1.4.19, division by zero was _supposed_ to be a late warning (one that could be ignored if it occurs on the dead branch of || or &&), but was propagated incorrectly (as evidenced by the eval(1||(1/0)) bug). Meanwhile, if you step through the code of 1.4.19, the warning for = was produced AFTER first checking whether the left and right side of = were both integers (and not errors). In 1.4.20, division by zero _is_ a late warning, while = is a warning produced the moment it is encountered, whether or not it is operating on integers or on late warnings. You can demonstrate this by seeing that the warning occurs even in the dead branch of ||, but does NOT occur if a fatal error (such as bad op) ends the parse early: $ m4-1.4.19 eval(1 || (1=1)) m4:stdin:1: Warning: recommend ==, not =, for equality operator 1 eval(1 &= (1=1)) m4:stdin:2: invalid operator in eval: 1 &= (1=1) $ m4-1.4.20 eval(1 || (1=1)) m4:stdin:1: Warning: recommend ==, not =, for equality operator 1 eval(1 &= (1=1)) m4:stdin:2: invalid operator in eval: 1 &= (1=1) In branch-1.6, where = has been ripped out (well, turned into an invalid operator like += and friends), the behavior is different again - like other bad operators, it stops the parse immediately (whether bad operator or missing ')' takes precedence is the subject of my patch yesterday). That opens up the door for an eventual m4 2.0 to repurpose assignment operators into actually assigning values into macro names [ie. eval(`a=1') could behave like define(`a', eval(1))]. > > Since it should be equal to > > eval((1 = 2) = (3 / 0)) > > I'd assume it should warn as follows: > > m4:stdin:1: Warning: recommend ==, not =, for equality operator > m4:stdin:1: divide by zero in eval: 1 = 2 = 3 / 0 > m4:stdin:1: Warning: recommend ==, not =, for equality operator > > which would be the breadth-first left-to-right evaluation strategy, > but instead it warns like so: > > m4:stdin:1: Warning: recommend ==, not =, for equality operator > m4:stdin:1: Warning: recommend ==, not =, for equality operator > m4:stdin:1: divide by zero in eval: 1 = 2 = 3 / 0 > > which is the depth-first left-to-right evaluation. Although, I'm not > entirely sure. Stepping through the code is the best way to see what actually happened. The first call to parse_expr starts with v1=1 and consumes/peeks at <v1=1>" = 2 ="; the second = is the same priority, so that is enough to set v1 to the result of "1=2", and that computation is what outputs the first warning. The loop then repeats, and it now consumes/peeks at <v1=0>"= 3 /" which is an operator of higher precedence, so it must recurse into a second parse_expr. The second parse_expr starts with v1=3 and consumes/peeks at <v1=3>" / 0\0" where end-of-string is a lower precedence operator, so it computes the division "3/0", which results in a divide-by-zero value, but no warning is issued yet. Control flow then returns to the first parse_expr, which is now sitting on <v1=0>=<v2=div0>"\0", and since end of string is lower precedence, it is time to compute the = operator, which outputs the second warning about = before returning a div0 result. Finally, the top-level evaluate() gets a result of div0, and produces the final warning and empty output. If we wanted, we could treat the warning for misuse of = the same as division by 0 (something that only occurs at most once if the expression is otherwise parseable, and can be silenced if it occurs in the dead branch of || or &&) rather than something that occurs every time it is encountered; and maybe even mask it out so that an eval() produces at most one warning overall (ie. a div0 warning could silence any = warnings). But given that = is disappearing entirely for 1.6, I'm not sure it is worth a 1.4.21 just for tweaks like that. Further, MOST m4 code won't trip the = messages (they have switched to == long ago to silence it), and the more important fact is whether the expansion of eval() is blank when there is a division by 0 error, with a message explaining how eval failed, rather than if or how many other warnings occurred alongside that error. That said, I think it's fairly easy to change m4p to more closely match m4 1.4.20 behavior, including better short-circuiting of || and &&, and warnings that match m4 1.4.20 for all of eval(1%0), eval(1=2=3/0), eval(1+=(2=3)), eval(1+=1+=1), and eval(`++a'). It does not, however, match eval(1||(1+=1)), although I like the resulting 'invalid operator' error of m4p with this patch better than m4 1.4.20's 'missing right parenthesis' (and the patch I recently posted for branch-1.6 has that nicer behavior as well). Also, eval(a++) reports 'bad expression (bad input)' with m4 1.4.20, but only 'bad expression' with this patch in m4p. It's up to you on how hard you want to try to exactly match m4 1.4.20's behavior, short of duplicating that parser identically rather than relying on Lark to parse for you. So this patch is not necessarily the best, but it's as much as I could whip out today if you want to use parts or all of it. My patch cheats and turns rules for invalid operators into consuming the rest of the input (use of /.+/) as closest to what m4 1.4.20 does, but I note that even without that cheat, your grammar isn't quite right (in C, the += production is closer to 'assignment: atom "+=" assignment', rather than 'assignment: logical_op "+=" logical_op', since the left hand side must be a single term and you can chain assignments with right-associativity). Not touched in this patch: I think there are places where you get the wrong answer because you don't convert to c_int32 until at the end of the evaluation, whereas m4 1.4.20 is doing ALL operations in signed 32-bit. But I haven't yet been bothered to demonstrate such a difference; maybe in a later email. diff --git i/.gitignore w/.gitignore index c14b1f1..92906e5 100644 --- i/.gitignore +++ w/.gitignore @@ -1,2 +1,3 @@ *.egg-info __pycache__ +.venv diff --git i/m4p/m4_eval.py w/m4p/m4_eval.py index 6f2e565..51576b4 100644 --- i/m4p/m4_eval.py +++ w/m4p/m4_eval.py @@ -48,9 +48,9 @@ def radix_str(n: int, base: int) -> bytes: class CalcInfo(Enum): DEPRECATED_EQ = auto() DIV_BY_ZERO = auto() + MOD_BY_ZERO = auto() NEGATIVE_EXP = auto() - INVALID_UNARY_OP = auto() - INVALID_BINARY_OP = auto() + INVALID_OP = auto() class PropagatingError: @@ -74,11 +74,7 @@ class BadInput(ParserError): pass -class InvalidUnaryOperator(PropagatingError): - pass - - -class InvalidBinaryOperator(PropagatingError): +class InvalidOperator(PropagatingError): pass @@ -86,26 +82,34 @@ class DivByZero(PropagatingError): pass +class ModByZero(PropagatingError): + pass + + class NegExp(PropagatingError): pass +class OperatorError(BaseException): + pass + + # For C's operator precedence see # <https://en.cppreference.com/w/c/language/operator_precedence>. calc_grammar = """ ?start: assignment ?assignment: logical_or - | logical_or "+=" logical_or -> invalid_binary_operator - | logical_or "-=" logical_or -> invalid_binary_operator - | logical_or "*=" logical_or -> invalid_binary_operator - | logical_or "/=" logical_or -> invalid_binary_operator - | logical_or "%=" logical_or -> invalid_binary_operator - | logical_or "|=" logical_or -> invalid_binary_operator - | logical_or "&=" logical_or -> invalid_binary_operator - | logical_or "^=" logical_or -> invalid_binary_operator - | logical_or ">>=" logical_or -> invalid_binary_operator - | logical_or "<<=" logical_or -> invalid_binary_operator + | logical_or "+=" /.+/ -> invalid_binary_operator + | logical_or "-=" /.+/ -> invalid_binary_operator + | logical_or "*=" /.+/ -> invalid_binary_operator + | logical_or "/=" /.+/ -> invalid_binary_operator + | logical_or "%=" /.+/ -> invalid_binary_operator + | logical_or "|=" /.+/ -> invalid_binary_operator + | logical_or "&=" /.+/ -> invalid_binary_operator + | logical_or "^=" /.+/ -> invalid_binary_operator + | logical_or ">>=" /.+/ -> invalid_binary_operator + | logical_or "<<=" /.+/ -> invalid_binary_operator ?logical_or: logical_and | logical_or "||" logical_and -> logical_or @@ -164,8 +168,8 @@ calc_grammar = """ | "-" atom -> neg | "~" atom -> invert | "!" atom -> not_ - | "--" atom -> invalid_unary_operator - | "++" atom -> invalid_unary_operator + | "--" /.+/ -> invalid_unary_operator + | "++" /.+/ -> invalid_unary_operator | atom "--" -> invalid_unary_operator | atom "++" -> invalid_unary_operator | "(" assignment ")" @@ -290,11 +294,11 @@ class CalculateTree(Transformer): return x == y def deprecated_eq(self, x, y): + self.info.append(CalcInfo.DEPRECATED_EQ) g = guard(x, y) if g: return g else: - self.info.append(CalcInfo.DEPRECATED_EQ) return x == y def lshift(self, x, y): @@ -312,20 +316,10 @@ class CalculateTree(Transformer): return x >> (y & 0x1F) def invalid_unary_operator(self, x): - g = guard(x) - if g: - return g - else: - self.info.append(CalcInfo.INVALID_UNARY_OP) - return InvalidUnaryOperator() + raise OperatorError() def invalid_binary_operator(self, x, y): - g = guard(x, y) - if g: - return g - else: - self.info.append(CalcInfo.INVALID_BINARY_OP) - return InvalidBinaryOperator() + raise OperatorError() def pow(self, x, y): g = guard(x, y) @@ -333,10 +327,8 @@ class CalculateTree(Transformer): return g else: if x == 0 and y == 0: - self.info.append(CalcInfo.DIV_BY_ZERO) return DivByZero() elif y < 0: - self.info.append(CalcInfo.NEGATIVE_EXP) return NegExp() else: return x**y @@ -347,8 +339,7 @@ class CalculateTree(Transformer): return g else: if y == 0: - self.info.append(CalcInfo.DIV_BY_ZERO) - return DivByZero() + return ModByZero() else: return x % y @@ -358,7 +349,6 @@ class CalculateTree(Transformer): return g else: if y == 0: - self.info.append(CalcInfo.DIV_BY_ZERO) return DivByZero() else: return x // y @@ -389,20 +379,26 @@ class CalculateTree(Transformer): return x def logical_and(self, x, y): + g_x = guard(x) + if g_x: + return g_x + g_y = guard(y) + if g_y and x != 0: + return g_y if x == 0: return 0 - if isinstance(x, DivByZero): - return x - elif isinstance(y, DivByZero): - return y - return int(x != 0 and y != 0) + return int(y != 0) def logical_or(self, x, y): - if isinstance(x, DivByZero): - return x - elif x == 0 and isinstance(y, DivByZero): - return y - return int(x != 0 or y != 0) + g_x = guard(x) + if g_x: + return g_x + g_y = guard(y) + if g_y and x == 0: + return g_y + if x != 0: + return 1 + return int(y != 0) calc_parser = Lark( @@ -413,7 +409,11 @@ calc_parser = Lark( def calc(s: bytes) -> int | PropagatingError: try: calc_parser.options.options["transformer"].reset() + info = calc_parser.options.options["transformer"].info result = calc_parser.parse(s) + except OperatorError as e: + info.append(CalcInfo.INVALID_OP) + return InvalidOperator() except ValueError: return BadInput() except UnexpectedCharacters as e: @@ -422,6 +422,12 @@ def calc(s: bytes) -> int | PropagatingError: return ParserError() g = guard(result) if g: + if isinstance(g, DivByZero): + info.append(CalcInfo.DIV_BY_ZERO) + elif isinstance(g, ModByZero): + info.append(CalcInfo.MOD_BY_ZERO) + elif isinstance(g, NegExp): + info.append(CalcInfo.NEGATIVE_EXP) return g else: return c_int32(result).value diff --git i/m4p/parser.py w/m4p/parser.py index c387e1a..175df28 100644 --- i/m4p/parser.py +++ w/m4p/parser.py @@ -862,13 +862,17 @@ class Parser: self.info( pathname, line, b"divide by zero in eval: %b" % arg_expression ) + case CalcInfo.MOD_BY_ZERO: + self.info( + pathname, line, b"modulo by zero in eval: %b" % arg_expression + ) case CalcInfo.NEGATIVE_EXP: self.info( pathname, line, b"negative exponent in eval: %b" % arg_expression, ) - case CalcInfo.INVALID_UNARY_OP | CalcInfo.INVALID_BINARY_OP: + case CalcInfo.INVALID_OP: self.returncode = 1 self.info( pathname, line, b"invalid operator in eval: %b" % arg_expression -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org