mbutrovich opened a new pull request, #9683:
URL: https://github.com/apache/arrow-rs/pull/9683

   # Which issue does this PR close?
   
   N/A.
   
   # Rationale for this change
   
   The existing `lexsort_to_indices` uses comparison sort on columnar arrays, 
which is O(n log n × comparison_cost) where comparison cost scales with the 
number of  columns. The Arrow row format (`RowConverter`) produces 
memcmp-comparable byte sequences, making it a natural fit for radix sort — O(n 
× key_width) — which can overcome the encoding overhead by eliminating 
per-comparison column traversal.
   
   Inspired by [DuckDB's sorting 
redesign](https://duckdb.org/2025/09/24/sorting-again), which uses MSD radix 
sort on normalized key prefixes with a comparison sort fallback, this PR adds 
an `radix_sort_to_indices` kernel that operates directly on row-encoded keys. 
Like DuckDB, we limit radix depth to 8 bytes before falling back to comparison 
sort, balancing radix efficiency against diminishing returns on deep recursion.
   
   # What changes are included in this PR?
   
   - **`arrow-row/src/radix.rs`** (new): MSD radix sort on `Rows` with:
     - 256-bucket histogram + out-of-place scatter per byte position
     - Comparison sort fallback for small buckets (≤64 elements) or after 8 
bytes of radix depth
     - Single pre-allocated temp buffer reused across recursion levels
   - **`arrow-row/src/lib.rs`**: Exposes `pub mod radix`
   - **`arrow/benches/lexsort.rs`**: Adds `lexsort_radix` benchmark variant 
alongside existing `lexsort_to_indices` and `lexsort_rows`, and removes a 
duplicate benchmark case
   
   ## Benchmark results
   
   All three variants include the full pipeline (encoding + sort) so the 
comparison against `lexsort_to_indices` (which doesn't encode) is 
apples-to-apples. Here are a subset of results from `cargo bench --bench 
lexsort`. I'll post the full table in a follow-up comment.
   
   | Schema, N | lexsort_to_indices | lexsort_rows | lexsort_radix |
   |---|---|---|---|
   | [i32, i32_opt], 4096 | 86.211 µs | 117.00 µs | 71.987 µs |
   | [i32, i32_opt], 32768 | 866.91 µs | 1.2274 ms | 359.07 µs |
   | [i32, str_opt(16)], 32768 | 860.45 µs | 1.8661 ms | 510.93 µs |
   | [str_opt(16), str(16)], 32768 | 2.4401 ms | 1.6789 ms | 946.89 µs |
   | [3x str], 32768 | 2.4590 ms | 1.9281 ms | 1.2148 ms |
   | [i32_opt, dict], 32768 | 1.1873 ms | 1.3615 ms | 627.81 µs |
   | [dict, dict], 32768 | 499.32 µs | 732.40 µs | 722.44 µs |
   | [3x dict, str(16)], 32768 | 4.1305 ms | 2.0389 ms | 1.5303 ms |
   | [i32_opt, i32_list], 32768 | 1.5171 ms | 2.8395 ms | 1.3393 ms |
   | [i32, i32_list, str(16)], 32768 | 862.46 µs | 2.2412 ms | 1.1633 ms |
   
   Radix sort is the fastest in the majority of cases. The main exception is 
pure low-cardinality dictionary columns where `lexsort_to_indices` avoids 
encoding overhead entirely. The module documentation provides guidance on when 
to use each approach.
   
   # Are these changes tested?
   
   17 tests including:
   - Deterministic tests for integers, strings, multi-column, nulls, all-equal, 
empty, single-element
   - All 4 sort option combinations (ascending/descending × nulls_first)
   - Float64 with NaN/Infinity, booleans
   - Threshold boundary tests (sizes 1–1000 around the fallback threshold)
   - Fuzz test: 100 iterations × 1–4 random columns × random types × random 
sort options × 5–500 rows
   - Cross-validation: verifies radix output matches comparison sort on the 
same `Rows`
   
   # Are there any user-facing changes?
   
   New public API: `arrow_row::radix::radix_sort_to_indices(&Rows) -> Vec<u32>`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to