[PR] perf: add spark.comet.exec.shuffle.maxBufferedBatches config [datafusion-comet]

via GitHub Thu, 26 Mar 2026 07:17:55 -0700


andygrove opened a new pull request, #3800:
URL: https://github.com/apache/datafusion-comet/pull/3800


   ## Summary
   
   - Adds `spark.comet.exec.shuffle.maxBufferedBatches` config to limit the 
number of batches buffered in memory before spilling during native shuffle. 
Setting a small value causes earlier spilling, reducing peak memory usage on 
executors at the cost of more disk I/O. The default of 0 preserves existing 
behavior (spill only when the memory pool is exhausted).
   - Fixes a too-many-open-files issue where each partition held one spill file 
descriptor open for the lifetime of the task. The spill file is now closed 
after each spill event and reopened in append mode for the next, keeping FD 
usage proportional to active writes rather than total partitions that have 
spilled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] perf: add spark.comet.exec.shuffle.maxBufferedBatches config [datafusion-comet]

Reply via email to