westonpace commented on issue #34173:
URL: https://github.com/apache/arrow/issues/34173#issuecomment-1448480463
> So to be explicit: count nulls as separate values like value_counts does,
and thus essentially do what the OP asked?
Yes.
In the future it might be nice to have a variation on `mode` which returns a
single row. I suppose the big challenge with mode is that it isn't
decomposable. You can't break it up into a "merge" and "consume" step.
However, these non-decomposable aggregates are often good fits for window
functions. Or for a "group by" where you have many small groups. Or, once
spilling is implemented, there are ways you can tackle it with larger groups
too. So it would be good to have a new `arrow::compute::Function::Kind`
someday. Perhaps `WHOLE_AGGREGATE` or `WINDOW_AGGREGATE` or something. The
rules would be:
* Must output a single value (one row one column, e.g. scalar)
* Can expect to receive all of the data as one large table
I can open up a new issue for this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]