Robert Stupp created CASSANDRA-9150:
---------------------------------------
Summary: BloomFilter false-positive-ratio does not match
false-positive-chance
Key: CASSANDRA-9150
URL: https://issues.apache.org/jira/browse/CASSANDRA-9150
Project: Cassandra
Issue Type: Improvement
Reporter: Robert Stupp
As part of CASSANDRA-8413 I wrote a test that checks bloom filter
false-positive-ratio and compares it to the target false-positive-chance.
[The
test|https://github.com/snazy/cassandra/blob/8413-bffp/test/unit/org/apache/cassandra/utils/BloomFilterTest.java#L359]
basically creates a BF using {{FilterFactory.getFilter}} with the number of
elements to be added and the FPC. It then adds that number of elements to the
filter. It checks the false-positive-ratio by calling {{IFilter.isPresent}}
using not-added keys.
It feels that the FPR increases linearly with the number of elements and
exceeds the FPC for filters with more than 1,000,000 elements. Filters with
100M elements have an FPR of .94, which is probably bad.
{code}
invertedHash=false fpc=0,010000 fpr=0,000107 for Int32Type with 10000
elements, bitset capacity= 100032, spec=BloomSpecification(K=5,
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,000931 for Int32Type with 100000
elements, bitset capacity= 1000064, spec=BloomSpecification(K=5,
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,009405 for Int32Type with 1000000
elements, bitset capacity= 10000064, spec=BloomSpecification(K=5,
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,093752 for Int32Type with 10000000
elements, bitset capacity= 100000064, spec=BloomSpecification(K=5,
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,942765 for Int32Type with 100000000
elements, bitset capacity=1000000064, spec=BloomSpecification(K=5,
bucketsPerElement=10)
{code}
Do I measure something wrong here?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)