Yitong Zhou created HADOOP-11727:
------------------------------------
Summary: Make org.hadoop.util.bloom.BloomFilter returns the
expected false positive probability
Key: HADOOP-11727
URL: https://issues.apache.org/jira/browse/HADOOP-11727
Project: Hadoop Common
Issue Type: Improvement
Reporter: Yitong Zhou
When bloom filtering, sometimes it would be handy to know the current expected
false positive rate (bitSet's cardinality / vector size)^(# of hash functions),
so that when the FP rate is too high, we can choose to rebuild the bloomfilter
into a larger size.
The codes would look like this:
{code}
/*
* Returns the expected false positive probability of the current filter.
*
* @return The expected false positive probability
*/
public double expectedFalsePositiveProbability() {
return Math.pow((double) bits.cardinality() / vectorSize, nbHash);
}
{code}
Does this sound like a reasonable minor function that could be added into the
code base?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)