samredai commented on PR #4466:
URL: https://github.com/apache/iceberg/pull/4466#issuecomment-1107157069
The ValueError getting raised comes from the fact that `__init__` is always
called after `__new__` when you call the standard class constructor. The first
argument to `__init__` (meaning `self`) is the value returned from `__new__`.
However for the `if rest` case that's in `__new__` here, a fully constructed
instance is returned by `return reduce(And, (left, right, *rest))` and so
`__init__` doesn't need to do anything, which is why it was only doing
something `if not rest`. So whether or not `rest` is provided is being used as
a sort of signal to `__init__` on whether or not the class has already been
fully initialized in `__new__`. Adding the `ValueError` disrupts the "already
initialized path" by causing `__init__` to panic.
Two other alternatives I can think of is:
1. Move the initialization entirely into `__new__` and get rid of `__init__`
completely:
```py
class And(BooleanExpression):
"""AND operation expression - logical conjunction"""
def __new__(cls, left: BooleanExpression, right: BooleanExpression,
*rest: BooleanExpression):
if rest:
return reduce(And, (left, right, *rest))
if left is AlwaysFalse() or right is AlwaysFalse():
return AlwaysFalse()
elif left is AlwaysTrue():
return right
elif right is AlwaysTrue():
return left
new_expression = super().__new__(cls)
new_expression.args = (left, right)
vars(new_expression)["left"] = left
vars(new_expression)["right"] = right
return new_expression
```
2. Move the reduction logic out into a standalone function and simplify the
`And` class to only accept a `left` and `right` arg:
```py
def and_expression(left: BooleanExpression, right: BooleanExpression, *rest:
BooleanExpression) -> Union[And, AlwaysTrue, AlwaysFalse]:
if rest:
return reduce(And, (left, right, *rest))
if left is AlwaysFalse() or right is AlwaysFalse():
return AlwaysFalse()
elif left is AlwaysTrue():
return right
elif right is AlwaysTrue():
return left
return And(left, right)
```
A note on option 1, someone can still override `__new__` in a subclass as
@rdblue pointed out but I think the expectation holds that they should call
`super().__new__(cls, *args, **kwargs)`, just as it's expected to call
`super().__init__(*args, **kwargs)` if you override `__init__` in a subclass.
A note about option 2 is that it may make checking if something is an
instance of `And` less intuitive, in other words:
```py
from iceberg.expressions.base import and_expression, And
expr = and_expression(left, right, rest)
isinstance(expr, And)
```
as opposed to:
```py
from iceberg.expressions.base import And
expr = And(left, right, rest)
isinstance(expr, And)
```
(sorry for the super long comment...)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]