etseidl commented on code in PR #10011:
URL: https://github.com/apache/arrow-rs/pull/10011#discussion_r3325012810
##########
parquet/src/bloom_filter/mod.rs:
##########
@@ -229,52 +238,24 @@ impl Block {
fn insert(&mut self, hash: u32) {
let mask = Self::mask(hash);
for i in 0..8 {
- self[i] |= mask[i];
+ self.0[i] |= mask.0[i];
}
}
- /// Check membership: returns `true` when *every* bit from `mask(hash)` is
- /// already set in this block (`block[i] & mask[i] != 0` for all 8 words).
+ /// Check membership: `true` iff every bit in `mask(hash)` is set in this
+ /// block (i.e. the value was probably inserted). `false` is definitive.
///
- /// A `true` result means "probably present" (other inserts may have set
- /// the same bits). A `false` is definitive — the value was never inserted.
+ /// Branchless `acc |= !block & mask; acc == 0` — the "testc" reduction
+ /// shape LLVM autovectorizes. A short-circuiting `.all()` would win a
+ /// few lanes on miss but defeat vectorization.
+ #[inline]
fn check(&self, hash: u32) -> bool {
let mask = Self::mask(hash);
+ let mut acc = 0u32;
for i in 0..8 {
- if self[i] & mask[i] == 0 {
- return false;
- }
- }
- true
- }
-}
-
-impl std::ops::Index<usize> for Block {
Review Comment:
Right you are. Had Sbbf in my head 😅
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]