[jira] [Commented] (PARQUET-41) Add bloom filters to parquet statistics

Ryan Blue (JIRA) Tue, 23 Jun 2015 09:37:38 -0700

    [ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597906#comment-14597906
 ]


Ryan Blue commented on PARQUET-41:
----------------------------------

Interesting, I hadn't heard about the counting bloom filters. But as I think a 
bit more about how the Hive ACID stuff works, I don't think it would help.

The base file is rewritten periodically to incorporate changes stored in the 
current set of deltas. That would rewrite the bloom filter from scratch, so 
there is no need for it to be reversible. Then if you're applying a delta on 
top of the base file, you only need to apply the filters to your delta because 
those rows entirely replace rows in the base. In that case, you have a static 
bloom filter per delta file and static bloom filters in the base file, too.

> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format, parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Ferdinand Xu
>              Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter. 
> This could be very useful in filtering entire row groups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PARQUET-41) Add bloom filters to parquet statistics

Reply via email to