xiedeyantu commented on PR #21088: URL: https://github.com/apache/datafusion/pull/21088#issuecomment-4106184220
> I think a rewrite like this might be useful, but I think it can also hurt performance because of the join on grouping keys. So I think it needs to have a config value (off by default) or when enabled some benchmarks showing that it is better in large majority of the cases. > > I am also wondering if mostly for memory usage a `GroupsAccumulator` for distinct count / sum might give similar/more improvements. @Dandandan Thank you for the explanation. It’s true that this would add a hash join, but if aggregation can be performed in parallel, there might be advantages in scenarios with two or more COUNT(DISTINCT) operations. I agree to run performance tests across multiple scenarios to evaluate the actual results. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
