[PR] perf: sort multi-column runs via row format and coalesce them [datafusion]

via GitHub Sat, 27 Jun 2026 13:13:41 -0700


Dandandan opened a new pull request, #23221:
URL: https://github.com/apache/datafusion/pull/23221


   ## Which issue does this PR close?
   
   Follow-up to #23202 (coalesce single-column sort runs). No specific issue.
   
   ## Rationale for this change
   
   #23202 coalesces buffered runs into fewer, larger runs to cut merge fan-in, 
but
   **only for single-column sorts**. Multi-column runs were deliberately left 
one
   run per batch, because coalescing them and sorting the larger run with
   `lexsort_to_indices` *regresses*: `lexsort`'s comparator walks the key 
columns
   one at a time with per-comparison dynamic dispatch (and heap touches for
   strings/dictionaries), and that cost grows super-linearly with run size.
   
   The fix is to change *how* a multi-column run is sorted, not just its size: 
for
   keys whose leading column is an expensive (variable-length or dictionary)
   comparison, encode the key once into the Arrow **row format** and argsort the
   row indices with a cheap `memcmp`. This is the same comparison the streaming
   merge already uses (`RowCursorStream`), so it is a natural fit — and it 
removes
   exactly the comparator cost that made coalescing regress. With that in place 
we
   can coalesce **all** multi-column runs.
   
   A microbenchmark (`sort_indices`, added here) isolates the index-computation
   cost — `lexsort_to_indices` vs. row-encode+argsort — by key shape and run 
size
   (speedup = lexsort / rowsort; >1 means row format is faster):
   
   | key shape | 1k | 64k | 1M |
   |---|---|---|---|
   | i64 (single) | 0.16× | 0.17× | 0.15× |
   | utf8 high (single) | 0.22× | 0.31× | 0.66× |
   | utf8 tuple | 1.86× | 2.46× | 3.38× |
   | dict tuple | 2.06× | 2.66× | 2.72× |
   | utf8 view tuple | 0.67× | 1.10× | 1.48× |
   | mixed tuple (i64-leading) | 0.49× | 0.67× | 0.78× |
   
   This is what motivates the gate: row format wins big for string/dict-leading
   keys (and more as runs grow), but **loses** for single-column keys and for
   primitive-leading keys (where `lexsort` short-circuits on the cheap first
   column). So the row-format path is gated to multi-column keys with an 
expensive
   leading column; everything else keeps `lexsort`.
   
   End-to-end (`cargo bench --bench sort`, the `SortExec` cases, vs. current 
main):
   
   | benchmark | speedup |
   |---|---|
   | sort utf8 tuple 100k / 1M | 1.34× / 1.19× |
   | sort utf8 view tuple 100k / 1M | 1.22× / 1.17× |
   | sort utf8 dictionary tuple 100k / 1M | 1.23× / 1.74× |
   | sort mixed dictionary tuple 100k / 1M | 1.47× / 1.81× |
   | sort mixed tuple 100k / 1M | 1.44× / 1.45× |
   | sort mixed tuple with utf8 view 100k / 1M | 1.51× / 1.40× |
   | sort i64 100k / 1M | ~unchanged (noise) |
   
   String/dict-leading tuples win via coalesce + row-format sort; the
   primitive-leading `mixed tuple` wins ~1.4× purely from coalesce + `lexsort` 
on
   larger runs (it now also gets coalesced). Single-column sorts are unchanged.
   
   ## What changes are included in this PR?
   
   - `sorted_indices()` in `sorts/stream.rs`: computes the sorted index 
permutation,
     choosing the Arrow row format for multi-column keys with an expensive 
leading
     column (`use_row_format_sort`) and `lexsort_to_indices` otherwise (single
     column, primitive-leading, or `fetch`/top-k).
   - `IncrementalSortIterator` (the single point every in-memory sort run flows
     through) now calls `sorted_indices()`. No merge/cursor changes.
   - `in_mem_sort_stream` now coalesces **all** runs, not just single-column 
ones.
   - New microbenchmark `datafusion/core/benches/sort_indices.rs`.
   
   ## Are these changes tested?
   
   Existing `sorts::` unit tests pass (NULLs, sort options, multi-column merge,
   spill). Correctness of the row-format ordering is covered by the existing
   multi-column sort tests; the new microbenchmark documents the perf rationale.
   
   ## Are there any user-facing changes?
   
   No (internal performance change only).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] perf: sort multi-column runs via row format and coalesce them [datafusion]

Reply via email to