[PR] Add batch pass-through optimization to SortPreservingMergeExec [datafusion]

via GitHub Fri, 03 Apr 2026 03:43:16 -0700


Dandandan opened a new pull request, #21335:
URL: https://github.com/apache/datafusion/pull/21335


   ## Which issue does this PR close?
   
   N/A — performance optimization
   
   ## Rationale for this change
   
   `SortPreservingMergeExec` currently processes every row through the loser 
tree, even when an entire batch from the winning stream is strictly less than 
all other streams' current values. This is common for:
   
   - Time-series data with range-partitioned inputs
   - After hash repartition + sort where partitions have distinct value ranges
   - Any workload where input batches don't overlap
   
   ## What changes are included in this PR?
   
   **Batch pass-through optimization**: After the loser tree selects a winner, 
if the winner's cursor is at the start of a new batch, check whether the 
batch's last value is strictly less than the runner-up's current value (found 
via O(log K) loser-tree walk).
   
   Two fast paths when the check succeeds:
   
   1. **Zero-copy**: When `in_progress` buffer is empty and the full batch 
qualifies, `RecordBatch::slice` returns the batch directly — no interleave, no 
row-level copying.
   
   2. **Bulk-push**: When partial rows are already buffered, append all 
qualifying rows at once via `push_rows`, skipping O(remaining × log K) 
loser-tree comparisons.
   
   ### Files changed
   
   - `cursor.rs`: Add `remaining()`, `last_cmp()`, `advance_by()`, 
`is_at_start()` to `Cursor`
   - `builder.rs`: Add `push_rows()` (bulk append) and `take_batch_slice()` 
(zero-copy emission)
   - `merge.rs`: Add `find_runner_up()`, `can_batch_pass_through()`, modify 
`poll_next_inner`
   - `sort_preserving_merge.rs`: 3 new tests covering non-overlapping, partial 
overlap, and fetch limits
   
   ## Are these changes tested?
   
   Yes — 3 new test cases plus all 18 existing tests pass:
   
   - `test_batch_pass_through_multi_batch`: Multiple non-overlapping batches 
per partition
   - `test_batch_pass_through_with_fetch`: Fetch limit that cuts through a 
pass-through batch
   - `test_batch_pass_through_partial_overlap`: Mixed overlap across 3 
partitions
   
   ## Are there any user-facing changes?
   
   No API changes. Existing behavior is preserved — the optimization is 
strictly an internal fast path that produces identical output.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Add batch pass-through optimization to SortPreservingMergeExec [datafusion]

Reply via email to