[ 
https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555041#comment-17555041
 ] 

ASF GitHub Bot commented on PARQUET-2157:
-----------------------------------------

ggershinsky commented on PR #975:
URL: https://github.com/apache/parquet-mr/pull/975#issuecomment-1157577513

   > The test takes about 2300 milli seconds on my laptop.
   
   Ok, this is reasonable. If this time is sufficient for reliably testing the 
upper limit of FPPs, it should be good enough to also check the lower limit, eg 
`exist > totalCount * (testFpp[i] * 0.9)` , or `exist > totalCount * 
(testFpp[i] * 0.5)` , or even `exist > 0`. What do you think? This way, we'll 
be certain the test passes not because `exist` is just 0.




> Add BloomFilter fpp config
> --------------------------
>
>                 Key: PARQUET-2157
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2157
>             Project: Parquet
>          Issue Type: Improvement
>          Components: parquet-mr
>            Reporter: Huaxin Gao
>            Priority: Major
>
> Currently parquet-mr hardcoded bloom filter fpp (false positive probability) 
> to 0.01.  We should have a config to let user to specify fpp.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to