randolf-scholz commented on PR #34993:
URL: https://github.com/apache/arrow/pull/34993#issuecomment-1685415286

   @pitrou It's not quite as easy...
   
   - `bool(float("nan"))` is `True`. Historically, π™½πšŠπ™½-floats were often used 
as missing value indicators.
   - `bool(None)` is `False`.
   - `bool(NotImplemented)` is `True`, but gives a `DeprecationWarning`.
   - `bool(pandas.NA)` raises `TypeError`.
   - `pyarrow.NA.as_py()` returns `None`.
   
   The last point would be a reason to return None. However, one should 
consider how and when `bool` evaluation might pop up.
   
   - assert statements.
   - branching logic.
   
   Coercing a missing boolean scalar to `False` instead of raising an error can 
potentially lead to some very nasty and hard to debug issues. I'd wager that in 
the vast majority of cases, branching logic based on a missing boolean is just 
nonsense and should be dismissed.
   
   There is also some inherent inconsistency with conversion to python if one 
takes this route.
   In a perfect world, one would expect the following diagram to commute:
   
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚pa Scalarβ”œβ”€β”€β”€op───►│pa Scalarβ”‚
       β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
            β”‚                   β”‚
            β”‚                   β”‚
          as_py               as_py
            β”‚                   β”‚
            β–Ό                   β–Ό
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚py Scalarβ”œβ”€β”€β”€op───►│py Scalarβ”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
   
   However, consider this example:
   
   
   ```python
   import pyarrow as pa
   
   pa_x = pa.scalar(None, type=pa.int64())
   pa_y = pa.compute.greater(x, None)
   result = y.as_py()   # None
   
   py_x = pa_x.as_py()
   result = py_x > 0   # TypeError: '>' not supported between instances of 
'NoneType' and 'int'
   ```
   
   So we see that if we translated to the python world immediately, there would 
have been a `TypeError`.
   
   In order to make the diagram "commute", the only reasonable solution is 
therefore to raise a `TypeError` when converting the null-bool to python. This 
way `result` is the same in both branches - a `TypeError`.
   
   By coercing the null-bool to `False`, one hides this `TypeError` which as 
said before can lead to all sorts of hard to debug bugs.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to