mingmwang commented on PR #6034: URL: https://github.com/apache/arrow-datafusion/pull/6034#issuecomment-1521324807
@ozankabak @mustafasrepo I strongly suggest to have separate implementation(Exec) for Streaming Aggregation. This is similar to how we separate the `HashJoinExec` /`SortMergeJoinExec` and `UnionExec` /`InterleaveExec`. With the split of physical plans, The physical plans will deliver clear informations about what kind real physical operators they are composed of. With the split of physical plans, we can keep each operator's code base (HashAggregation and SortAggregation) relatively simple. We can further keep a relatively lightweight grouping state for each operators. The memory layout of the grouping state is critical for performance. For the hash aggregation performance, currently, we still have huge gaps compared with DuckDb. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
