Dandandan opened a new pull request, #9746: URL: https://github.com/apache/arrow-rs/pull/9746
## Summary - Replace the short-circuiting `idx_chunk.iter().all(|&i| (i as usize) < dict_len)` in the bit-packed hot loop of `RleDecoder::get_batch_with_dict` with a u32 max-reduction. `.all` blocks autovectorisation; `fold(0u32, |acc, &i| acc.max(i as u32))` has no early exit, so LLVM lowers the check to a single SIMD max-reduction and reuses the loaded registers for the gather that follows. - Adds `parquet/benches/rle_dict.rs`, a small targeted Criterion bench that drives `get_batch_with_dict` directly (i32 and `String` dictionaries, sizes 16/256/1024, 8192 values per batch). ## Why On aarch64 the old code compiled to eight serialised `ldrsw` + `cmp` + `b.ls` pairs per 8-index chunk, followed by eight separate scalar gather loads — one lane at a time. After the change the bounds check is one SIMD reduction: ``` ldp q1, q0, [x11], #0x20 ; load 8 indices umax.4s v2, v1, v0 ; lane-wise max umaxv.4s s2, v2 ; horizontal max fmov w13, s2 cmp x20, x13 ; one bounds check b.ls <panic> ``` and `v1 / v0` are then reused for the gather, avoiding the reloads. Negative `i32` values cast to `u32` become large, so the check still rejects them. ## Measurements Apple Silicon (aarch64), `cargo bench --bench rle_dict`: | case | before | after | Δ | |---------------------|-------------|-------------|-------| | str/dict=16 | 59.48 µs | 57.90 µs | −2.6% | | i32/dict=16 | 3.28 µs | 3.34 µs | noise | | str/dict=256 | 48.72 µs | 47.96 µs | noise | | i32/dict=256 | 3.33 µs | 3.21 µs | −3.3% | | str/dict=1024 | 34.29 µs | 33.01 µs | −4.2% | | i32/dict=1024 | 3.79 µs | 3.74 µs | −1.7% | ## Test plan - [x] `cargo test -p parquet --lib -- encodings::rle::` - [x] `cargo bench -p parquet --bench rle_dict --features experimental` (results above) - [ ] Verify CI 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
