[PR] perf: eagerly flush shuffle partitions once batch_size rows accumulated [experimental] [datafusion-comet]

via GitHub Sun, 29 Mar 2026 18:47:02 -0700


andygrove opened a new pull request, #3839:
URL: https://github.com/apache/datafusion-comet/pull/3839


   ## Which issue does this PR close?
   
   N/A - experimental performance optimization
   
   ## Rationale for this change
   
   The current multi-partition shuffle partitioner buffers all input batches in 
memory until `shuffle_write()` is called at the end. This requires large 
offheap memory allocations, especially at higher scale factors.
   
   This PR modifies the multi-partition partitioner to eagerly flush a 
partition's accumulated row indices into compressed IPC output (via the 
existing spill infrastructure) as soon as that partition reaches `batch_size` 
rows. This bounds the index memory per partition while maintaining the same 
output format and performance as the current approach.
   
   ## What changes are included in this PR?
   
   - In `buffer_partitioned_batch_may_spill`, after accumulating row indices, 
check if any partition has reached `batch_size` rows and flush it immediately 
via `flush_partition()`
   - New `flush_partition()` method that creates a `PartitionedBatchIterator` 
for a single partition, spills the produced batches to disk, and clears the 
indices
   - Made `PartitionedBatchIterator::new` visible to sibling modules
   
   ## How are these changes tested?
   
   Benchmarked on TPC-H SF1000 (Q1):
   - Baseline (main): 65.7s
   - This PR: 63.9s (no regression)
   - For comparison, the immediate-mode partitioner (PR #3838): 111.4s
   
   Full TPC-H SF1000 run in progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] perf: eagerly flush shuffle partitions once batch_size rows accumulated [experimental] [datafusion-comet]

Reply via email to