[GitHub] [arrow] jorisvandenbossche commented on pull request #8294: ARROW-1846: [C++][Compute] Implement "any" reduction kernel for boolean data

GitBox Wed, 30 Sep 2020 05:27:39 -0700


jorisvandenbossche commented on pull request #8294:
URL: https://github.com/apache/arrow/pull/8294#issuecomment-701357437



   Just a high level remark (didn't yet look at the code), but I think the 
example you gave:
   
   ```
   In []: a = pa.array([True, None], type='bool') 
       ...:  
       ...: # option 1 
       ...: pc.any(a).as_py() is True 
       ...: pc.any_kleene(a).as_py() is None 
       ...:  
       ...: # option 2 
       ...: pc.any(null_handling='skip') is True 
       ...: pc.any(null_handling='emit_null') is None                           
                          
   ```
   
   has a wrong output for the kleene version. With Kleene logic, also the 
second output would be True, as the array already contains a True, the missing 
value doesn't matter anymore. 
   
   Using Kleene logic or not is not the same as the skip/emit null handling. By 
default, if nulls are skipped, then it doesn't matter if you use Kleene logic 
or not, since there are no nulls to behave in certain ways. So only when not 
skipping nulls, you get a different behaviour: ``any([True, None], 
skipna=False)`` or `any_kleene([True, None], skipna=False)` would still both 
give True as result, since there is any True. But eg  ``any([False, None], 
skipna=False)`` woud give False (the missing being False) vs 
`any_kleene([False, None], skipna=False)` giving null as result.
   
   See also our discussions in pandas about this 
(https://github.com/pandas-dev/pandas/issues/29686; 
https://pandas.pydata.org/pandas-docs/stable/user_guide/boolean.html#kleene-logical-operations)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] jorisvandenbossche commented on pull request #8294: ARROW-1846: [C++][Compute] Implement "any" reduction kernel for boolean data

Reply via email to