[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597906#comment-14597906
]
Ryan Blue commented on PARQUET-41:
----------------------------------
Interesting, I hadn't heard about the counting bloom filters. But as I think a
bit more about how the Hive ACID stuff works, I don't think it would help.
The base file is rewritten periodically to incorporate changes stored in the
current set of deltas. That would rewrite the bloom filter from scratch, so
there is no need for it to be reversible. Then if you're applying a delta on
top of the base file, you only need to apply the filters to your delta because
those rows entirely replace rows in the base. In that case, you have a static
bloom filter per delta file and static bloom filters in the base file, too.
> Add bloom filters to parquet statistics
> ---------------------------------------
>
> Key: PARQUET-41
> URL: https://issues.apache.org/jira/browse/PARQUET-41
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-format, parquet-mr
> Reporter: Alex Levenson
> Assignee: Ferdinand Xu
> Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter.
> This could be very useful in filtering entire row groups.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)