ClSlaid opened a new pull request, #9758: URL: https://github.com/apache/arrow-rs/pull/9758
## Summary - add a direct `BatchCoalescer::push_batch_with_indices` path for primitive, `Utf8View`, and `BinaryView` columns when the indices are integer typed and non-null - specialise indexed copying for primitive and byte-view in-progress arrays so supported schemas can coalesce rows directly without materialising an intermediate taken `RecordBatch` - keep other data types on the existing `take_record_batch` fallback; benchmark work on this branch showed widening the direct path beyond primitive and view arrays regressed `Utf8` and dictionary-backed cases ## Testing - `cargo test -p arrow-select coalesce --lib` - `cargo clippy -p arrow-select --lib --tests -- -D warnings` - `cargo clippy -p arrow --bench coalesce_kernels --features test_utils -- -D warnings` - `cargo clippy --workspace --all-targets -- -D warnings` ## Benchmarks - `take: primitive, 8192, nulls: 0, selectivity: 0.01`: `3.5194-3.5796 ms` -> `1.8780-1.9136 ms` - `take: primitive, 8192, nulls: 0.1, selectivity: 0.01`: `5.5208-5.5708 ms` -> `4.0016-4.1647 ms` - `take: primitive, 8192, nulls: 0, selectivity: 0.001`: `23.684-23.813 ms` -> `5.9713-6.0137 ms` - `take: single_utf8view, 8192, nulls: 0, selectivity: 0.01`: `3.0301-3.0830 ms` -> `2.4513-2.4854 ms` - `take: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.01`: `1.8643-1.8823 ms` -> `1.2706-1.2856 ms` - `take: single_binaryview, 8192, nulls: 0, selectivity: 0.01`: `3.1346-3.2991 ms` -> `2.7578-2.8539 ms` - `take: mixed_binaryview (max_string_len=20), 8192, nulls: 0, selectivity: 0.01`: `1.9634-2.0215 ms` -> `1.4117-1.4383 ms` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
