[ 
https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17324404#comment-17324404
 ] 

ASF GitHub Bot commented on PARQUET-41:
---------------------------------------

jbapple commented on pull request #757:
URL: https://github.com/apache/parquet-mr/pull/757#issuecomment-821928966


   @shannonwells If you use equation 3 and fix the block size as 256 bits and 
the number of inner hash functions as 8, you'll be able to generate something 
akin to figure 1. You can then compare the FPP you calculated with the minimum 
FPP for static filters.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format, parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Junjie Chen
>            Priority: Major
>              Labels: filter2, pull-request-available
>             Fix For: format-2.7.0, 1.12.0
>
>
> For row groups with no dictionary, we could still produce a bloom filter. 
> This could be very useful in filtering entire row groups.
> Pull request:
> https://github.com/apache/parquet-mr/pull/215



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to