Hi all
As discussed in the sync-up meeting, I 'd like to propose a vote on Bloom
filter design doc
<https://docs.google.com/document/d/1mIZ0W24Cr79QHJWN1sQ3dIUc4lAK5AVqozwSwtpFhW8/edit?usp=sharing>and
its corresponding parquet-format PR
<https://github.com/apache/parquet-format/pull/99> , and then we can move
forward to update parquet spec and do read/write side implementation.
What we have done includes:
The PoC benchmark
<https://docs.google.com/spreadsheets/d/1yV3u-P_yY4DtfSty3LPrbhwuJx4cqm_YeK61s2v0OLU/edit?usp=sharing>.
It includes comparison between with and without Bloom filter, Bloom filter
and dictionary filter. The results show promising improvement in selective
queries.
Bloom filter utility class implementation in java and c++ language.
This vote is to determine if parquet committers can accept Bloom filter
design and its corresponding parquet-format changes.
+1: Accept the design and related changes of parquet-format
+0: ...
-1: Because ...
Thanks & Best Regards