Xuwei Fu created PARQUET-2256:
---------------------------------
Summary: Adding Compression for BloomFilter
Key: PARQUET-2256
URL: https://issues.apache.org/jira/browse/PARQUET-2256
Project: Parquet
Issue Type: Improvement
Components: parquet-cpp
Affects Versions: format-2.9.0
Reporter: Xuwei Fu
In Current Parquet implementions, if BloomFilter doesn't set the ndv, most
implementions will guess the 1M as the ndv. And use it for fpp. So, if fpp is
0.01, the BloomFilter size may grows to 2M for each column, which is really
huge. Should we support compression for BloomFilter, like:
```
/**
* The compression used in the Bloom filter.
**/
struct Uncompressed {}
union BloomFilterCompression {
1: Uncompressed UNCOMPRESSED;
+2: CompressionCodec COMPRESSION;
}
```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)