Hi all

As discussed in the sync-up meeting, I 'd like to propose a vote on Bloom
filter design doc
<https://docs.google.com/document/d/1mIZ0W24Cr79QHJWN1sQ3dIUc4lAK5AVqozwSwtpFhW8/edit?usp=sharing>and
its corresponding parquet-format PR
<https://github.com/apache/parquet-format/pull/99> , and then we can move
forward to update parquet spec and do read/write side implementation.

What we have done includes:

    The PoC benchmark
<https://docs.google.com/spreadsheets/d/1yV3u-P_yY4DtfSty3LPrbhwuJx4cqm_YeK61s2v0OLU/edit?usp=sharing>.
It includes comparison between with and without Bloom filter, Bloom filter
and dictionary filter. The results show promising improvement in selective
queries.

    Bloom filter utility class implementation in java and c++ language.

This vote is to determine if parquet committers can accept Bloom filter
design and its corresponding parquet-format changes.

+1: Accept the design and related changes of parquet-format
+0: ...
-1: Because ...


Thanks & Best Regards

Reply via email to