Csaba Ringhofer created PARQUET-1981:
----------------------------------------
Summary: Consider adding BloomFilterHeader to ColumnMetaData
Key: PARQUET-1981
URL: https://issues.apache.org/jira/browse/PARQUET-1981
Project: Parquet
Issue Type: Improvement
Components: parquet-format
Reporter: Csaba Ringhofer
Currently ColumnMetaData only contains bloom_filter_offset, which points to
BloomFilterHeader followed by the bloom filter data.
This solution is not optimal during reading, as two IO reads are needed once we
know bloom_filter_offset - one to read the header, which contains the size of
the bloom filter, then another to read the actual bloom filter to a buffer.
Having the size near bloom_filter_offset would allow to do this in a single
read.
Having algorithm/hash/compression could be also useful by allowing skipping the
read of the bloom filter if one of those parameters is not supported.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)