randolf-scholz commented on PR #34993:
URL: https://github.com/apache/arrow/pull/34993#issuecomment-1685415286
@pitrou It's not quite as easy...
- `bool(float("nan"))` is `True`. Historically, π½ππ½-floats were often used
as missing value indicators.
- `bool(None)` is `False`.
- `bool(NotImplemented)` is `True`, but gives a `DeprecationWarning`.
- `bool(pandas.NA)` raises `TypeError`.
- `pyarrow.NA.as_py()` returns `None`.
The last point would be a reason to return None. However, one should
consider how and when `bool` evaluation might pop up.
- assert statements.
- branching logic.
Coercing a missing boolean scalar to `False` instead of raising an error can
potentially lead to some very nasty and hard to debug issues. I'd wager that in
the vast majority of cases, branching logic based on a missing boolean is just
nonsense and should be dismissed.
There is also some inherent inconsistency with conversion to python if one
takes this route.
In a perfect world, one would expect the following diagram to commute:
βββββββββββ βββββββββββ
βpa ScalarββββopββββΊβpa Scalarβ
ββββββ¬βββββ ββββββ¬βββββ
β β
β β
as_py as_py
β β
βΌ βΌ
βββββββββββ βββββββββββ
βpy ScalarββββopββββΊβpy Scalarβ
βββββββββββ βββββββββββ
However, consider this example:
```python
import pyarrow as pa
pa_x = pa.scalar(None, type=pa.int64())
pa_y = pa.compute.greater(x, None)
result = y.as_py() # None
py_x = pa_x.as_py()
result = py_x > 0 # TypeError: '>' not supported between instances of
'NoneType' and 'int'
```
So we see that if we translated to the python world immediately, there would
have been a `TypeError`.
In order to make the diagram "commute", the only reasonable solution is
therefore to raise a `TypeError` when converting the null-bool to python. This
way `result` is the same in both branches - a `TypeError`.
By coercing the null-bool to `False`, one hides this `TypeError` which as
said before can lead to all sorts of hard to debug bugs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]