Jimexist commented on code in PR #3102:
URL: https://github.com/apache/arrow-rs/pull/3102#discussion_r1022220568


##########
parquet/src/bloom_filter/mod.rs:
##########
@@ -79,6 +81,33 @@ fn block_check(block: &Block, hash: u32) -> bool {
 /// A split block Bloom filter
 pub struct Sbbf(Vec<Block>);
 
+// this size should not be too large to not to hit short read too early 
(although unlikely)
+// but also not to small to ensure cache efficiency, this is essential a 
"guess" of the header
+// size. In the demo test the size is 15 bytes.
+const STEP_SIZE: usize = 16;

Review Comment:
   that's a good idea, let me change it to 20 byte read and bail, before 
switching to more complicated "guessing" from the page indices gaps



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to