alamb commented on PR #11758: URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2281723372
> Thank you @alamb 🙏. Let me analyze it further 🤔 In order to actually generate the output in multiple batches and gain performance, I think we would need to change: 1. The `GroupValues` storage (so that it never creates a large contiguous range) 2. The `GroupsAccumulators` likewise to manage the internal state as multiple chunks and not as single chunks This would likely require some sort of API change to the accumulators / etc I wonder if we could find some way to do the implementation incrementally -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
