[
https://issues.apache.org/jira/browse/ARROW-13764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405274#comment-17405274
]
David Li commented on ARROW-13764:
----------------------------------
And to be clear, this is about the groups, not the values themselves? i.e.
{noformat}
keys = [0, 0, 1, 1, null]
values = ["a", null, "b", "b", "c"]{noformat}
should give
{noformat}
counts = {0: 2, 1: 1} {noformat}
instead of
{noformat}
counts = {0: 2, 1: 1, null: 1} {noformat}
?
> [C++] Implement ScalarAggregateOptions for count_distinct (grouped)
> --------------------------------------------------------------------
>
> Key: ARROW-13764
> URL: https://issues.apache.org/jira/browse/ARROW-13764
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++
> Reporter: Nic Crane
> Assignee: David Li
> Priority: Major
> Labels: kernel
> Fix For: 6.0.0
>
>
> I'm writing the R bindings for the grouped {{count_distinct}} kernel, but the
> current implementation counts nulls as their own group. To match the R
> behaviour, I need to be able to specify whether or not to remove NA/NULL
> values.
> Please could we have ScalarAggregateOptions implemented for
> {{count_distinct}}?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)