westonpace commented on issue #34173:
URL: https://github.com/apache/arrow/issues/34173#issuecomment-1442268812
From an SQL perspective "null" means "this value is unknown" which is why
skip_nulls=False yields null on something like sum. `1 + 2 + 3 + "some unknown
number"` yields "something unknown".
Mode is pretty weird where you might now the correct answer even if you
don't know all the numbers. So I think the correct behavior of
skip_nulls=False should be (apologies for the bad pseudo-code):
```
if (the count of the second most frequent element + the null count < the
count of the most frequent element):
return the most frequent element
else:
return null
```
Therefore, if null is the most frequent element, then I think null should be
the result when skip_nulls is false.
If skip_nulls=True then the behavior should be the same as
`mode(x.filter(pc.is_valid(x))` and should only return null if every element is
null.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]