[
https://issues.apache.org/jira/browse/ARROW-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273407#comment-17273407
]
Joris Van den Bossche commented on ARROW-11412:
-----------------------------------------------
[~romankarlstetter] Thanks for the report!
So using {{&}} instead of {{and}} does work (and the same for {{|}} instead of
{{or}}, and {{~}} instead of {{not}}):
{code}
In [10]: ds.scalar(False) & ds.scalar(True)
Out[10]: <pyarrow.dataset.Expression (false and true)>
{code}
(note it gives "False and True", because the expression is only captured and
not directly simplified)
Now, the reason for the unexpected results is that we don't control the
behaviour of {{and}} and {{or}} (Python let's you override & and | with bitwise
{{__and__}} and {{__or__}} operators). So it is using "plain" Python logic for
the {{and}} and {{or}} operators. In which case it looks at the "truthiness" of
the object ({{bool(..)}}, which _can_ be overriden with {{__bool__}}). And
because we currently don't override this, each expression (also the "False"
expression) simply is seen as "true".
All the example return values you show above follow from that. For example in
{{ds.scalar(False) or ds.scalar(True)}}, Python will first check if the left
value is "true", if that's the case return it ({{or}} cuts short here without
evaluating the right side), and otherwise check whether the right side value is
"true". In our case, because {{ds.scalar(False)}} is "true", that is simply
returned. You can observe something similar by doing {{2 or 3}}, which will
return 2 because it is a "truthy" value.
Something similar can be explained for the other examples (also the ones with
the "expected" result are actually not fully correct, e.g. {{not
ds.scalar(True)}} no longer returns an expression, which is not what we would
ideally want).
Now, we are limited here to what Python let's us customize. So I don't think we
are able to fully get {{and}}, {{or}} and {{not}} working as we would like. The
better option might be to raise an error in {{__bool__}}, with an informative
error message to avoid that people run into this trap (similarly as eg numpy
arrays also raise in {{__bool__}}, try eg {{not np.array([1, 2])}})
> [Python] (C++?) Expression evaluation problem for logical boolean expressions
> (and, or, not)
> --------------------------------------------------------------------------------------------
>
> Key: ARROW-11412
> URL: https://issues.apache.org/jira/browse/ARROW-11412
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 2.0.0, 3.0.0
> Reporter: Roman Karlstetter
> Priority: Major
>
> There's a problem with boolean "and", "or" and "not" expressions when
> creating them in python (or I'm doing something completely stupid).
>
> {code:java}
> >>> import pyarrow
> >>> pyarrow.__version__
> '3.0.0'
> >>> import pyarrow.dataset as ds
> >>> ds. scalar(False) and ds.scalar(True) # <--- I expect false
> <pyarrow.dataset.Expression true>
> >>> ds.scalar(True) and ds.scalar(False) # this works
> <pyarrow.dataset.Expression false>
> >>> ds.scalar(True) or ds.scalar(False) # this works
> <pyarrow.dataset.Expression true>
> >>> ds.scalar(False) or ds.scalar(True) # <--- I expect true
> <pyarrow.dataset.Expression false>
> >>> not ds.scalar(True) # this works
> >>>
> >>>
> >>>
> False
>
>
> >>> not ds.scalar(False) <--- I expect true
> >>>
> >>>
> >>>
> False
> {code}
> I tried to figure out what goes wrong here, but there are no obvious problems
> in the python code, same for C++ (but I didn't quite understand everything of
> it yet).
>
> This happens with pyarrow3 and pyarrow2
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)