andygrove opened a new pull request, #3800: URL: https://github.com/apache/datafusion-comet/pull/3800
## Summary - Adds `spark.comet.exec.shuffle.maxBufferedBatches` config to limit the number of batches buffered in memory before spilling during native shuffle. Setting a small value causes earlier spilling, reducing peak memory usage on executors at the cost of more disk I/O. The default of 0 preserves existing behavior (spill only when the memory pool is exhausted). - Fixes a too-many-open-files issue where each partition held one spill file descriptor open for the lifetime of the task. The spill file is now closed after each spill event and reopened in append mode for the next, keeping FD usage proportional to active writes rather than total partitions that have spilled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
