[ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597897#comment-14597897
 ] 

Jason Altekruse commented on PARQUET-41:
----------------------------------------

If you store a bit array that is true. There is a related datastructure, as I 
understood it there is not an alternative name for it that does allow for 
deletions. The trade off is storage size, but this is tunable just ask the 
regular bloom filter is in terms of a trade off between size and false positive 
rate.

http://www.ics.uci.edu/~jsimons/slides/seminar/IBF.pdf

> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format, parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Ferdinand Xu
>              Labels: filter2
>
> For row groups with no dictionary, we could still produce a bloom filter. 
> This could be very useful in filtering entire row groups.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to