[jira] [Assigned] (ARROW-12728) [C++][Compute] Aggregates: implement count distinct

David Li (Jira) Wed, 04 Aug 2021 07:38:04 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-12728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


David Li reassigned ARROW-12728:
--------------------------------

    Assignee: David Li

> [C++][Compute] Aggregates: implement count distinct
> ---------------------------------------------------
>
>                 Key: ARROW-12728
>                 URL: https://issues.apache.org/jira/browse/ARROW-12728
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>    Affects Versions: 4.0.0
>            Reporter: Michal Nowakiewicz
>            Assignee: David Li
>            Priority: Major
>             Fix For: 6.0.0
>
>
> Implement count distinct aggregate reusing hash table from hash group by 
> inside of it.
> This brings support to SQL queries like:
> select a, count(distinct b), count(distinct c) from t group by a
> For instance to compute count(distinct b), the first group id mapping will 
> give group id based on column a value; then the second group id mapping is 
> done using the key (groupid(a), b) inside count(distinct b) aggregate 
> (similarly for count(distinct c)). 
> After all input rows are consumed, the final processing step scans the hash 
> tables based on (groupid(a), b) and updates an array of counts indexed by 
> groupid(a). 
> The resulting array of counts represents the output of count distinct 
> aggregate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Assigned] (ARROW-12728) [C++][Compute] Aggregates: implement count distinct

Reply via email to