[I] Adding Compression for BloomFilter [parquet-format]

via GitHub Sat, 22 Jun 2024 23:32:17 -0700


asfimport opened a new issue, #408:
URL: https://github.com/apache/parquet-format/issues/408


   In Current Parquet implementions, if BloomFilter doesn't set the ndv, most 
implementions will guess the 1M as the ndv. And use it for fpp. So, if fpp is 
0.01, the BloomFilter size may grows to 2M for each column, which is really 
huge. Should we support compression for BloomFilter, like:
   
    
   
   ```
   
    /\*\*
   - The compression used in the Bloom filter.
    \*\*/
   struct Uncompressed {}
   union BloomFilterCompression {
     1: Uncompressed UNCOMPRESSED;
   +2: CompressionCodec COMPRESSION;
   }
   
   ```
   
   **Reporter**: [Xuwei 
Fu](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mwish) / 
@mapleFU
   
   <sub>**Note**: *This issue was originally created as 
[PARQUET-2256](https://issues.apache.org/jira/browse/PARQUET-2256). Please see 
the [migration 
documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further 
details.*</sub>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Adding Compression for BloomFilter [parquet-format]

Reply via email to