[GitHub] [arrow] alamb edited a comment on pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

GitBox Tue, 22 Sep 2020 21:21:12 -0700


alamb edited a comment on pull request #8222:
URL: https://github.com/apache/arrow/pull/8222#issuecomment-696842287



   @drusso  I think you are correct that we would need a separate group by 
operator for each count distinct and then combine them together:
   
   so `SELECT c1, COUNT(DISTINCT c2), COUNT(DISTINCT c3) FROM t1 GROUP BY c1` 
might look like
   
   ```
   HashAggregateExec: // this second phase then counts
     group_expr:
       Column(c1)
     aggr_expr:
       CountReduce(Column(c2))
     input:
       HashAggregateExec: // this first agg expr finds all distinct values of 
(c1,c2)
         group_expr:
           Column(c1), Column(c2)
           input:
             CsvExec:
   
   JOIN ON (c1):
   
   HashAggregateExec: // this second phase then counts
     group_expr:
       Column(c1)
     aggr_expr:
       CountReduce(Column(c3))
     input:
       HashAggregateExec: // this first agg expr finds all distinct values of 
(c1,c2)
         group_expr:
           Column(c1), Column(c3)
           input:
             CsvExec:
   ```
   
   Or something. I like your suggestion to get an implementation in (this one) 
and then iterate as needed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] alamb edited a comment on pull request #8222: ARROW-10043: [Rust][DataFusion] Implement COUNT(DISTINCT col)

Reply via email to