[
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597897#comment-14597897
]
Jason Altekruse commented on PARQUET-41:
----------------------------------------
If you store a bit array that is true. There is a related datastructure, as I
understood it there is not an alternative name for it that does allow for
deletions. The trade off is storage size, but this is tunable just ask the
regular bloom filter is in terms of a trade off between size and false positive
rate.
http://www.ics.uci.edu/~jsimons/slides/seminar/IBF.pdf
> Add bloom filters to parquet statistics
> ---------------------------------------
>
> Key: PARQUET-41
> URL: https://issues.apache.org/jira/browse/PARQUET-41
> Project: Parquet
> Issue Type: New Feature
> Components: parquet-format, parquet-mr
> Reporter: Alex Levenson
> Assignee: Ferdinand Xu
> Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter.
> This could be very useful in filtering entire row groups.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)