emkornfield commented on code in PR #9628:
URL: https://github.com/apache/arrow-rs/pull/9628#discussion_r3012611033
##########
parquet/src/bloom_filter/mod.rs:
##########
@@ -641,4 +897,197 @@ mod tests {
);
}
}
+
+ /*
+ Ok, so the following is trying to prove in simple terms that folding
an SBBF and
+ building a fresh smaller SBBF from scratch prodcues the exact same bits
+
+ If you insert the same values into a 512-block filter and fold it to
256 blocks,
+ you get a bit-for-bit identical result to just inserting those values
into a
+ 256-block filter directly. The fold doesn't lose information or
scramble anything,
+ it's like you had known the right size all along
+
+ This works because of the 2 lemmas:
+ 1. when you half the filter, each hash's block index divides cleanly
by 2
+ so the hash that went to block `i` in the big filter goes to block
`i/2` in the small one
+ which is exactly where the fold puts it
+ > this is trivial since floor(x/2) == floor(floor(x) / 2) is a basic
math fact
+
+ 2. the bit pattern set _within_ a block depends only on the lower 32
bits of the hash,
+ which doesn't change with filter size. So the same bits get set
regardless!
+ > structually trivial, mask() takes a u32 and uses only the SALT
constants..
Review Comment:
```suggestion
> structually trivial, mask() takes a u32 and uses only the SALT
constants.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]