[
https://issues.apache.org/jira/browse/PARQUET-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17555118#comment-17555118
]
ASF GitHub Bot commented on PARQUET-2157:
-----------------------------------------
huaxingao commented on PR #975:
URL: https://github.com/apache/parquet-mr/pull/975#issuecomment-1157762209
> it should be good enough to also check the lower limit, eg exist >
totalCount * (testFpp[i] * 0.9) , or exist > totalCount * (testFpp[i] * 0.5) ,
or even exist > 0. What do you think? This way, we'll be certain the test
passes not because exist is just 0.
Thanks for the suggestion! I can't find a reliable number for the lower
limit. I put `exist > 0`.
> Add BloomFilter fpp config
> --------------------------
>
> Key: PARQUET-2157
> URL: https://issues.apache.org/jira/browse/PARQUET-2157
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-mr
> Reporter: Huaxin Gao
> Priority: Major
>
> Currently parquet-mr hardcoded bloom filter fpp (false positive probability)
> to 0.01. We should have a config to let user to specify fpp.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)