[ https://issues.apache.org/jira/browse/PARQUET-2257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17699705#comment-17699705 ]
ASF GitHub Bot commented on PARQUET-2257: ----------------------------------------- wgtmac commented on code in PR #194: URL: https://github.com/apache/parquet-format/pull/194#discussion_r1134163134 ########## src/main/thrift/parquet.thrift: ########## @@ -753,6 +753,9 @@ struct ColumnMetaData { /** Byte offset from beginning of file to Bloom filter data. **/ 14: optional i64 bloom_filter_offset; + + /** Size of Bloom filter data, in bytes. **/ + 15: optional i32 bloom_filter_length; Review Comment: On the writer side: - Old writer only writes offset. - New writer should write length as well. On the reader size: - Old reader only checks offset. - New reader checks offset then try to use length if exists. ########## src/main/thrift/parquet.thrift: ########## @@ -753,6 +753,9 @@ struct ColumnMetaData { /** Byte offset from beginning of file to Bloom filter data. **/ 14: optional i64 bloom_filter_offset; + + /** Size of Bloom filter data, in bytes. **/ + 15: optional i32 bloom_filter_length; Review Comment: On the writer side: - Old writer only writes offset. - New writer should write length as well. On the reader side: - Old reader only checks offset. - New reader checks offset then try to use length if exists. > [Format] Add bloom_filter_length to ColumnMetaData > -------------------------------------------------- > > Key: PARQUET-2257 > URL: https://issues.apache.org/jira/browse/PARQUET-2257 > Project: Parquet > Issue Type: New Feature > Components: parquet-format > Reporter: Gang Wu > Assignee: Gang Wu > Priority: Major > > The specs only has added bloom_filter_offset to locate the bloom filter. The > reader cannot load the bloom filter in a single shot until it parses the > bloom filter header to get the total size. > This issue proposes to add an optional bloom_filter_length field to track the > size of bloom filter to facilitate I/O scheduling. -- This message was sent by Atlassian Jira (v8.20.10#820010)