rluvaton commented on issue #15271:
URL: https://github.com/apache/datafusion/issues/15271#issuecomment-2773463974

   I saw while debugging some performance issue in `AggregateExec` I see that 
we keep all spilled files open (`RefCountedTempFile` as it keep `tempfile` 
which hold `File`).
   
   and also when merging we read at least 1 batch from every spill file:
   
https://github.com/apache/datafusion/blob/73171986166e3f83ba2b5f8e5ac2f85463dadb28/datafusion/physical-plan/src/aggregates/row_hash.rs#L1059-L1062
   
   so If I have a lot of spill files or if every batch is really huge (contains 
very large lists - like result for array_agg on large dataset) we have all of 
this in memory.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to