[PR] feat: add immediate-mode shuffle partitioner [experimental] [datafusion-comet]

via GitHub Sun, 29 Mar 2026 11:54:13 -0700


andygrove opened a new pull request, #3838:
URL: https://github.com/apache/datafusion-comet/pull/3838


   ## Which issue does this PR close?
   
   Closes #.
   
   ## Rationale for this change
   
   The current "buffered" shuffle mode collects all input batches into memory 
before partitioning and writing. This works well for small-to-medium datasets 
but can cause high memory usage and spill pressure for large shuffles. An 
"immediate" mode that partitions each batch as it arrives can reduce peak 
memory and improve throughput for many workloads.
   
   ## What changes are included in this PR?
   
   This is an experimental implementation of an immediate-mode shuffle 
partitioner. Key changes:
   
   - Add `ImmediatePartitioner` that partitions each input batch on arrival 
using a single take-then-slice approach, avoiding full-batch buffering
   - Add per-partition in-memory buffers (replacing per-partition temp files) 
with memory accounting and spill support
   - Extract shared index writer logic and encapsulate buffer access for reuse 
between partitioners
   - Add `spark.comet.shuffle.mode` config option with `buffered` (default) and 
`immediate` modes
   - Add standalone shuffle benchmark binary for profiling (`shuffle_bench`)
   - Rename the existing shuffle mode from "Default" to "Buffered" for clarity
   
   ## How are these changes tested?
   
   Existing shuffle tests pass with both modes. The standalone shuffle 
benchmark binary can be used for performance comparison and profiling.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat: add immediate-mode shuffle partitioner [experimental] [datafusion-comet]

Reply via email to