mbutrovich opened a new pull request, #9683: URL: https://github.com/apache/arrow-rs/pull/9683
# Which issue does this PR close? N/A. # Rationale for this change The existing `lexsort_to_indices` uses comparison sort on columnar arrays, which is O(n log n × comparison_cost) where comparison cost scales with the number of columns. The Arrow row format (`RowConverter`) produces memcmp-comparable byte sequences, making it a natural fit for radix sort — O(n × key_width) — which can overcome the encoding overhead by eliminating per-comparison column traversal. Inspired by [DuckDB's sorting redesign](https://duckdb.org/2025/09/24/sorting-again), which uses MSD radix sort on normalized key prefixes with a comparison sort fallback, this PR adds an `radix_sort_to_indices` kernel that operates directly on row-encoded keys. Like DuckDB, we limit radix depth to 8 bytes before falling back to comparison sort, balancing radix efficiency against diminishing returns on deep recursion. # What changes are included in this PR? - **`arrow-row/src/radix.rs`** (new): MSD radix sort on `Rows` with: - 256-bucket histogram + out-of-place scatter per byte position - Comparison sort fallback for small buckets (≤64 elements) or after 8 bytes of radix depth - Single pre-allocated temp buffer reused across recursion levels - **`arrow-row/src/lib.rs`**: Exposes `pub mod radix` - **`arrow/benches/lexsort.rs`**: Adds `lexsort_radix` benchmark variant alongside existing `lexsort_to_indices` and `lexsort_rows`, and removes a duplicate benchmark case ## Benchmark results All three variants include the full pipeline (encoding + sort) so the comparison against `lexsort_to_indices` (which doesn't encode) is apples-to-apples. Here are a subset of results from `cargo bench --bench lexsort`. I'll post the full table in a follow-up comment. | Schema, N | lexsort_to_indices | lexsort_rows | lexsort_radix | |---|---|---|---| | [i32, i32_opt], 4096 | 86.211 µs | 117.00 µs | 71.987 µs | | [i32, i32_opt], 32768 | 866.91 µs | 1.2274 ms | 359.07 µs | | [i32, str_opt(16)], 32768 | 860.45 µs | 1.8661 ms | 510.93 µs | | [str_opt(16), str(16)], 32768 | 2.4401 ms | 1.6789 ms | 946.89 µs | | [3x str], 32768 | 2.4590 ms | 1.9281 ms | 1.2148 ms | | [i32_opt, dict], 32768 | 1.1873 ms | 1.3615 ms | 627.81 µs | | [dict, dict], 32768 | 499.32 µs | 732.40 µs | 722.44 µs | | [3x dict, str(16)], 32768 | 4.1305 ms | 2.0389 ms | 1.5303 ms | | [i32_opt, i32_list], 32768 | 1.5171 ms | 2.8395 ms | 1.3393 ms | | [i32, i32_list, str(16)], 32768 | 862.46 µs | 2.2412 ms | 1.1633 ms | Radix sort is the fastest in the majority of cases. The main exception is pure low-cardinality dictionary columns where `lexsort_to_indices` avoids encoding overhead entirely. The module documentation provides guidance on when to use each approach. # Are these changes tested? 17 tests including: - Deterministic tests for integers, strings, multi-column, nulls, all-equal, empty, single-element - All 4 sort option combinations (ascending/descending × nulls_first) - Float64 with NaN/Infinity, booleans - Threshold boundary tests (sizes 1–1000 around the fallback threshold) - Fuzz test: 100 iterations × 1–4 random columns × random types × random sort options × 5–500 rows - Cross-validation: verifies radix output matches comparison sort on the same `Rows` # Are there any user-facing changes? New public API: `arrow_row::radix::radix_sort_to_indices(&Rows) -> Vec<u32>`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
