rluvaton commented on issue #15271: URL: https://github.com/apache/datafusion/issues/15271#issuecomment-2773463974
I saw while debugging some performance issue in `AggregateExec` I see that we keep all spilled files open (`RefCountedTempFile` as it keep `tempfile` which hold `File`). and also when merging we read at least 1 batch from every spill file: https://github.com/apache/datafusion/blob/73171986166e3f83ba2b5f8e5ac2f85463dadb28/datafusion/physical-plan/src/aggregates/row_hash.rs#L1059-L1062 so If I have a lot of spill files or if every batch is really huge (contains very large lists - like result for array_agg on large dataset) we have all of this in memory. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org