[ 
https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555118#comment-17555118
 ] 

ASF GitHub Bot commented on PARQUET-2157:
-----------------------------------------

huaxingao commented on PR #975:
URL: https://github.com/apache/parquet-mr/pull/975#issuecomment-1157762209

   > it should be good enough to also check the lower limit, eg exist > 
totalCount * (testFpp[i] * 0.9) , or exist > totalCount * (testFpp[i] * 0.5) , 
or even exist > 0. What do you think? This way, we'll be certain the test 
passes not because exist is just 0.
   
   Thanks for the suggestion! I can't find a reliable number for the lower 
limit. I put `exist > 0`. 




> Add BloomFilter fpp config
> --------------------------
>
>                 Key: PARQUET-2157
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2157
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Huaxin Gao
>            Priority: Major
>
> Currently parquet-mr hardcoded bloom filter fpp (false positive probability) 
> to 0.01.  We should have a config to let user to specify fpp.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to