[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597222#comment-14597222
]
Ferdinand Xu edited comment on PARQUET-41 at 6/23/15 7:23 AM:
--------------------------------------------------------------
Hi,
Any suggestion or comments about my current solution?
I'm also thinking about using the Bloom Filter API from Guava instead of
implementing it by our own.
[http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/hash/BloomFilter.html#create(com.google.common.hash.Funnel,
int, double)]
In the first step we should finalize what we should store in the parquet-format.
If trying to use the guava, we will store the expected insertions and the false
positive probabilities which could be different from the current solution.
With the regards of the comments from [~dwhite], we could put the discussion of
multi-strategies support here. And also we could discuss about how we archive
the fall back for bloom filter as [~spena] suggests.
Thank you!
was (Author: ferd):
Hi,
I'm thinking about using the Bloom Filter API from Guava instead of
implementing it by our own. Any suggestions or comments?
[http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/hash/BloomFilter.html#create(com.google.common.hash.Funnel,
int, double)]
Thank you!
> Add bloom filters to parquet statistics
> ---------------------------------------
>
> Key: PARQUET-41
> URL: https://issues.apache.org/jira/browse/PARQUET-41
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-format, parquet-mr
> Reporter: Alex Levenson
> Assignee: Ferdinand Xu
> Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter.
> This could be very useful in filtering entire row groups.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)