emkornfield commented on code in PR #9628:
URL: https://github.com/apache/arrow-rs/pull/9628#discussion_r3012755964


##########
parquet/src/bloom_filter/mod.rs:
##########
@@ -431,6 +451,162 @@ impl Sbbf {
         self.0.capacity() * std::mem::size_of::<Block>()
     }
 
+    /// Returns the number of blocks in this bloom filter.
+    pub fn num_blocks(&self) -> usize {
+        self.0.len()
+    }
+
+    /// Fold the bloom filter once, halving its size by merging adjacent block 
pairs.
+    ///
+    /// This implements an elementary folding operation for Split Block Bloom
+    /// Filters. Each pair of adjacent blocks is combined via bitwise OR:
+    ///
+    /// ```text
+    /// folded[i] = blocks[2*i] | blocks[2*i + 1]    for 0 <= i < num_blocks/2
+    /// ```
+    ///
+    /// ## Why adjacent pairs (not halves)?
+    ///
+    /// Standard Bloom filter folding merges the two halves (`B[i] | B[i + 
m/2]`) because

Review Comment:
   nit: as an explanation it might pay to reverse, this I'm not sure whether 
readers would commonly be aware of bloom filter folding.  So it might be better 
to explain why half first and then indicate why this is different then the 
linked paper.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to