[GitHub] [arrow-datafusion] mingmwang commented on pull request #6034: Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered

via GitHub Tue, 25 Apr 2023 00:52:08 -0700


mingmwang commented on PR #6034:
URL: 
https://github.com/apache/arrow-datafusion/pull/6034#issuecomment-1521324807


   @ozankabak @mustafasrepo 
   
   I strongly suggest to have separate implementation(Exec) for Streaming 
Aggregation. This is similar to how we separate the `HashJoinExec` 
/`SortMergeJoinExec` and `UnionExec` /`InterleaveExec`.
   With the split of physical plans, The physical plans will deliver clear 
informations about what kind real physical operators they are composed of.
   With the split of physical plans, we can keep each operator's code base 
(HashAggregation and SortAggregation) relatively simple. 
   We can further keep a relatively lightweight grouping state for each 
operators. The memory layout of the grouping state is critical for performance. 
 For the hash aggregation performance, currently, we still have huge gaps 
compared with DuckDb.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] mingmwang commented on pull request #6034: Implement Streaming Aggregation: Do not break pipeline in aggregation if group by columns are ordered

Reply via email to