Dandandan opened a new pull request, #23107:
URL: https://github.com/apache/datafusion/pull/23107

   ## Which issue does this PR close?
   
   - N/A — small, self-contained performance improvement to the k-way merge.
   
   ## Rationale for this change
   
   `SortPreservingMergeStream` (the loser-tree k-way merge behind 
`SortPreservingMergeExec`, and `SortExec`'s spill/streaming merge) uses a 
round-robin tie-breaker — enabled by default — to balance how often 
equal-valued streams are polled.
   
   At the final/root comparison it compared the two finalists with `==` and 
then, when they were *not* equal, compared them again with `>`. That is up to 
**two** comparisons per output row at the root, where the non-tie-breaker path 
does one. For cheap comparators (single primitive column) this is negligible, 
but for byte-wise row comparisons (multi-column keys) and string keys the extra 
comparison per row is measurable.
   
   ## What changes are included in this PR?
   
   Collapse the `==` + `>` into a single `Ord::cmp` returning `Ordering`, 
matching on `Equal` / `Greater` / `Less`. The behavior is identical — it just 
removes the redundant comparison per row at the root node.
   
   Benchmarked with the existing `sort_preserving_merge` benchmark (1M rows × 3 
partitions, round-robin tie-breaker enabled):
   
   - multi-column (no-tie) merge: now on par with the tie-breaker disabled (the 
redundant comparison was the entire gap)
   - tied single/multi `u64`: ~5–6% faster
   
   ## Are these changes tested?
   
   Covered by the existing tests in `sorts/merge.rs` and 
`sorts/sort_preserving_merge.rs` (including 
`test_round_robin_tie_breaker_success` / `test_round_robin_tie_breaker_fail` 
and the merge ordering tests). This is a behavior-preserving change, so no new 
tests are added.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to