andygrove commented on code in PR #3941: URL: https://github.com/apache/datafusion-comet/pull/3941#discussion_r3076496348
########## docs/source/user-guide/latest/tuning.md: ########## @@ -154,6 +154,24 @@ partitioning keys. Columns that are not partitioning keys may contain complex ty Comet Columnar shuffle is JVM-based and supports `HashPartitioning`, `RoundRobinPartitioning`, `RangePartitioning`, and `SinglePartitioning`. This shuffle implementation supports complex data types as partitioning keys. +### Shuffle Spill Tuning + +The native shuffle writer buffers input batches in memory and periodically spills them to disk. Two mechanisms +control when spilling occurs: + +1. **Memory pressure**: When the memory pool rejects an allocation, the writer spills its buffered data to disk. + +2. **Batch spill limit**: The writer also spills after buffering a fixed number of input batches, regardless of + memory availability. This prevents the writer from accumulating too much data, which can degrade throughput + due to poor cache locality during the final write phase. + +The batch spill limit is configured via `spark.comet.exec.shuffle.batchSpillLimit` (default: 100). Setting it +to 0 disables this threshold, meaning spills only occur under memory pressure. + +In most cases, the default value of 100 provides good performance. If you observe that shuffle throughput Review Comment: I will update this section once I have run benchmarks with different values -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
