[GitHub] [arrow-rs] Jimexist commented on pull request #3102: parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo

GitBox Mon, 14 Nov 2022 17:17:27 -0800


Jimexist commented on PR #3102:
URL: https://github.com/apache/arrow-rs/pull/3102#issuecomment-1314625409


   > Once we have read the file metadata we know the byte ranges of the column 
chunks, and page indexes, as well as the offsets of the bloom filter data for 
each column chunk. It should therefore be possible to get a fairly accurate 
overestimate of the length of each bloom filter, simply by process of 
elimination.
   
   Thanks for the suggestion. I wonder if that is future proof, e.g. if there 
are more data structure to be added later beside sbbf, page index, etc. would 
that be a problem? Thinking out loud... that this would just be ballooning the 
over-estimate and/or make the likelihood of needing to look at both locations 
before it can correctly locate which was the right one when parquet file was 
written.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] Jimexist commented on pull request #3102: parquet bloom filter part II: read sbbf bitset from row group reader, update API, and add cli demo

Reply via email to