Re: [PR] GH-50026: [C++][Parquet] SIMD-accelerate SBBF probe via branchless autovec [arrow]

via GitHub Sun, 24 May 2026 20:57:20 -0700


dmatth1 commented on PR #50030:
URL: https://github.com/apache/arrow/pull/50030#issuecomment-4531360238


   Branchless body alone (no xsimd kernel) on AVX2: 
    - on clang `-mavx2` it's within noise of the hand-written xsimd kernel in 
every regime
    - on gcc it matches except ~0.79× of xsimd in the out-of-L3 regime. 
    - That gap is why this PR ships a separate xsimd kernel for the AVX2 TU 
rather than relying on autovec alone — on clang-only builds the xsimd kernel is 
essentially a no-op but on gcc/MSVC it pins the `vptest` lowering.
   
   Cache regime sweep: scalar vs xsimd, post-hash probe latency:
   
     | Regime | scalar | xsimd | Speedup |
     |---|---:|---:|---:|
     | Small in-cache (0.5 MiB) | 12.35 ns | 2.48 ns | 5.0× |
     | Medium out-of-L3 (128 MiB) | 18.40 ns | 7.41 ns | 2.5× |
     | Large deep DRAM (1 GiB) | 31.05 ns | 22.10 ns | 1.4× |
   
   These numbers are with the `as_batch_bool` xsimd form (~1 cycle faster 
in-cache than the shipped `miss != 0` spelling — out-of-cache regimes 
unchanged) and the post-hash only (XXH64 excluded) so absolute values don't 
compare directly to the end-to-end commit-body table. The regime shape (biggest 
gain in-cache, smallest in DRAM) holds for the shipped form. 
   
   Can re-bench in-tree with the commit if you want directly-comparable numbers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-50026: [C++][Parquet] SIMD-accelerate SBBF probe via branchless autovec [arrow]

Reply via email to