Rachelint commented on PR #11758: URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2282280596
> > Thank you @alamb 🙏. Let me analyze it further 🤔 > > In order to actually generate the output in multiple batches and gain performance, I think we would need to change: > > 1. The `GroupValues` storage (so that it never creates a large contiguous range) > > 2. The `GroupsAccumulators` likewise to manage the internal state as multiple chunks and not as single chunks > > > This would likely require some sort of API change to the accumulators / etc > > I wonder if we could find some way to do the implementation incrementally I agree, finally it should be a big change which switches the group values and related states mananged by block like duckdb , and I am working on this. But maybe just splitting the emit result still have benefits? Seems that it can avoid calling the `slice` function many times which really costs cpu, too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
