dmatth1 opened a new pull request, #10011:
URL: https://github.com/apache/arrow-rs/pull/10011

   # Which issue does this PR close?
   
     No tracked issue — opening directly, following the precedent of 
[apache/arrow-go#336](https://github.com/apache/arrow-go/pull/336) which 
shipped AVX2/SSE4/NEON SBBF probes in 18.3.0, and paralleling an in-progress
     
[[DISCUSS]](https://lists.apache.org/thread/omof0fq47tndfd80g5hwp2bvjmzvpb40) 
thread on `[email protected]` for the C++ port of the same kernel.
   
     # Rationale for this change
   
     `Sbbf::check` / `Sbbf::insert` are on the hot path of Parquet row-group 
skipping for every reader downstream of `arrow-rs` (DataFusion, Databend, 
InfluxDB / IOx, RisingWave, GreptimeDB). Each 256-bit Parquet block is exactly 
one AVX2 vector;
     the K=8 lane test collapses to one `vptest` (`_mm256_testc_si256`). This 
PR vectorises that loop on x86_64 without changing the algorithm, hash, salts, 
or wire format. NEON / aarch64 SIMD support is slated for a follow-up PR.
   
     # What changes are included in this PR?
   
     - AVX2 kernel in `simd_x86`, dispatched via cached 
`is_x86_feature_detected!("avx2")` (dead-coded when `-C target-cpu=native`).
     - Scalar `Block::{check,insert}` retained as the production fallback for 
non-AVX2 x86 / aarch64 / wasm32 / RISC-V / 32-bit / big-endian, and as the 
correctness reference the AVX2 kernel is diff-tested against.
     - `Block` changed from `#[repr(transparent)]` to `#[repr(C, align(32))]`. 
Byte layout unchanged; alignment is asserted at compile time so the AVX2 
aligned load/store contract is load-bearing.
     - `parquet/benches/bloom_filter.rs` gains `bench_check` (miss/hit × three 
cache regimes) and `bench_insert` exercising the public API.
   
     # Are these changes tested?
   
     Yes. The 31 pre-existing `bloom_filter` unit tests continue to pass on 
x86_64 with and without `-C target-cpu=native`. Two new diff tests — 
`test_simd_{check,insert}_matches_scalar` — assert bit-identical AVX2-vs-scalar 
output across 10K 
     random `(block, hash)` pairs each. Benchmark results (Cascade Lake-class 
Xeon) are in the commit message.
   
     # Are there any user-facing changes?
   
     No. Public API, MSRV, dependencies, and wire format are all unchanged. The 
only observable effect is faster `Sbbf::check` / `Sbbf::insert` on x86_64 hosts 
with AVX2.
   
     ---
   
     The SIMD kernel was drafted with AI assistance and reviewed line-by-line; 
correctness is enforced in CI by the diff tests above. `cargo fmt --all -- 
--check` and `cargo clippy -p parquet --all-targets -- -D warnings` both clean 
on this branch.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to