jorisvandenbossche commented on issue #34173:
URL: https://github.com/apache/arrow/issues/34173#issuecomment-1445804812

   > So I think the correct behavior of skip_nulls=False should be (apologies 
for the bad pseudo-code):
   
   The problem with this logic is that the `mode` kernel also returns the count 
(since we support returning more than just the "top" mode in 
https://github.com/apache/arrow/pull/8637). And so what value would you use for 
the count if you take into account the null count like that in case there is an 
actual most frequent element? 
   
   > If skip_nulls=True then the behavior should be the same as 
`mode(x.filter(pc.is_valid(x))` and should only return null if every element is 
null.
   
   Agreed that this is the typical logic, but so currently for an empty array 
(or all-null array after skipping the nulls), we return an empty result, not 
null:
   
   ```
   >>> pc.mode(pa.array([], pa.int64()), 1, skip_nulls=True) 
   <pyarrow.lib.StructArray object at 0x7fbdaf658580>
   -- is_valid: all not null
   -- child 0 type: int64
     []
   -- child 1 type: int64
     []
   ```
   
   Should that be changed to a null struct / struct of nulls?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to