jorisvandenbossche commented on issue #34173:
URL: https://github.com/apache/arrow/issues/34173#issuecomment-1430583319

   I have to say that the current `skip_null` keyword is a bit strange, and it 
also doesn't do what is documented:
   
   ```
   >>> pc.mode([1, 2, 2, None, None], 2, skip_nulls=True) 
   <pyarrow.lib.StructArray object at 0x7f64ea3ab940>
   -- is_valid: all not null
   -- child 0 type: int64
     [
       2,
       1
     ]
   -- child 1 type: int64
     [
       2,
       1
     ]
   
   >>> pc.mode([1, 2, 2, None, None], 2, skip_nulls=False)
   <pyarrow.lib.StructArray object at 0x7f64ea4c3e20>
   -- is_valid: all not null
   -- child 0 type: int64
     []
   -- child 1 type: int64
     []
   ```
   
   So if there are nulls and you specify to not skip them, the result becomes 
_empty_, while the docs say that if any input value is null, the output becomes 
null for `skip_nulls=False` (but null and empty is not the same).
   
   I am not fully sure what "mode" should do with nulls. On the one hand, the 
mode kernel is related to "value_counts", and this kernel will also count 
nulls. On the other hand, it's more seen as a (numerical) aggregation kernel, 
and those all skip (by default) or just propagate nulls (I don't think there is 
another aggregation kernel that really "does" something with the nulls).
   
   cc @cyb70289 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to