[jira] [Comment Edited] (PARQUET-41) Add bloom filters to parquet statistics

Ferdinand Xu (JIRA) Tue, 23 Jun 2015 00:24:38 -0700

    [ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597222#comment-14597222
 ]


Ferdinand Xu edited comment on PARQUET-41 at 6/23/15 7:23 AM:
--------------------------------------------------------------

Hi,
Any suggestion or comments about my current solution?
I'm also thinking about using the Bloom Filter API from Guava instead of 
implementing it by our own.
[http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/hash/BloomFilter.html#create(com.google.common.hash.Funnel,
 int, double)]
In the first step we should finalize what we should store in the parquet-format.
If trying to use the guava, we will store the expected insertions and the false 
positive probabilities which could be different from the current solution.

With the regards of the comments from [~dwhite], we could put the discussion of 
multi-strategies support here. And also we could discuss about how we archive 
the fall back for bloom filter as [~spena] suggests.

Thank you!


was (Author: ferd):
Hi,
I'm thinking about using the Bloom Filter API from Guava instead of 
implementing it by our own. Any suggestions or comments?
[http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/hash/BloomFilter.html#create(com.google.common.hash.Funnel,
 int, double)]
Thank you!

> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format, parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Ferdinand Xu
>              Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter. 
> This could be very useful in filtering entire row groups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PARQUET-41) Add bloom filters to parquet statistics

Reply via email to