[I] GroupedHashAggregateStream should create smaller spill batches [arrow-datafusion]

via GitHub Tue, 31 Oct 2023 03:44:57 -0700


milenkovicm opened a new issue, #8003:
URL: https://github.com/apache/arrow-datafusion/issues/8003


   ### Is your feature request related to a problem or challenge?
   
   At the moment GroupedHashAggregateStream will spill state as a single batch, 
which is not optimal when merging as it loads whole file in memory as a single 
batch. 
   
   ### Describe the solution you'd like
   
   I'd like to spit spill batch into smaller chunks with default batch size 
same as default batch size set in configuration property.
   
   ### Describe alternatives you've considered
   
   I have considered setting batch size to a fixed size or read from 
configuration property, but at the moment I did not do it as it would be bigger 
change. 
   
   ### Additional context
   
   Relates to #7858


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] GroupedHashAggregateStream should create smaller spill batches [arrow-datafusion]

Reply via email to