Michal Nowakiewicz created ARROW-12728:
------------------------------------------

             Summary: [C++][Compute] Aggregates: implement count distinct
                 Key: ARROW-12728
                 URL: https://issues.apache.org/jira/browse/ARROW-12728
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
    Affects Versions: 4.0.0
            Reporter: Michal Nowakiewicz
             Fix For: 5.0.0


Implement count distinct aggregate reusing hash table from hash group by inside 
of it.

This brings support to SQL queries like:
select a, count(distinct b), count(distinct c) from t group by a

For instance to compute count(distinct b), the first group id mapping will give 
group id based on column a value; then the second group id mapping is done 
using the key (groupid(a), b) inside count(distinct b) aggregate (similarly for 
count(distinct c)). 
After all input rows are consumed, the final processing step scans the hash 
tables based on (groupid(a), b) and updates an array of counts indexed by 
groupid(a). 
The resulting array of counts represents the output of count distinct aggregate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to