Dandandan opened a new pull request, #9754:
URL: https://github.com/apache/arrow-rs/pull/9754
# Which issue does this PR close?
- Follow-up to #9746 applying the same trick on another hot loop.
# Rationale for this change
The non-null branch of `take_native` — the primitive `take` gather —
bounds-checks each index individually via `values[index.as_usize()]`.
The per-lane branch dominates the loop and blocks autovectorisation.
Reducing a chunk of indices to their maximum with `fold`+`max` has no
early exit, so LLVM lowers the whole check to a SIMD max-reduction; on
aarch64 that's two `ldp q` + three `umax.4s` + one `umaxv.4s` — a
single bounds check per chunk, then the gather.
`max_idx` no longer has to stay live on the hot path for the panic
format string because the panic is moved into a `#[cold]` helper,
which removes a per-chunk stack spill (`str x16, [sp, #16]`).
Signed index types sign-extend to `usize::MAX` on `as_usize()`, so
negative indices still fail the check; panic behaviour is preserved.
# What changes are included in this PR?
One-file change in `arrow-select/src/take.rs`:
- `CHUNK = 16` chunked max-reduction bounds check
- `#[cold]` `oob` helper for the panic path
- preallocated output via `Vec::set_len` + `chunks_exact_mut` so the
gather is a straight SIMD store (no `push` / capacity-check
overhead)
Output is written into a `Vec<T>` with uninitialised capacity and
`set_len(len)` up front; `T: ArrowNativeType` is `Copy` with no
`Drop`, and every slot is written before `out` is read.
# Are these changes tested?
Yes — covered by existing `take::tests::*`, including
`test_take_out_of_bounds_panic`.
Measured on aarch64 (Apple Silicon) with
`cargo bench --bench take_kernels -- "^take i32"`:
| bench | before | after | Δ |
| --------------------- | ------ | ------ | -------- |
| take i32 512 | 309 ns | 279 ns | **−9.7%** |
| take i32 1024 | 469 ns | 431 ns | **−8.1%** |
| take i32 null indices | 542 ns | 545 ns | no change (unchanged branch) |
# Are there any user-facing changes?
No — no API change, panic behaviour on out-of-bounds indices is
preserved.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]