alamb commented on PR #11758:
URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2281723372

   > Thank you @alamb 🙏. Let me analyze it further 🤔
   
   In order to actually generate the output in multiple batches and gain 
performance, I think we would need to change:
   1. The `GroupValues` storage (so that it never creates a large contiguous 
range)
   2. The `GroupsAccumulators` likewise to manage the internal state as 
multiple chunks and not as single chunks
   
   This would likely require some sort of API change to the accumulators / etc 
   
   I wonder if we could find some way to do the implementation incrementally


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to