Robert Stupp created CASSANDRA-9150:
---------------------------------------

             Summary: BloomFilter false-positive-ratio does not match 
false-positive-chance
                 Key: CASSANDRA-9150
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9150
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Robert Stupp


As part of CASSANDRA-8413 I wrote a test that checks bloom filter 
false-positive-ratio and compares it to the target false-positive-chance.

[The 
test|https://github.com/snazy/cassandra/blob/8413-bffp/test/unit/org/apache/cassandra/utils/BloomFilterTest.java#L359]
 basically creates a BF using {{FilterFactory.getFilter}} with the number of 
elements to be added and the FPC. It then adds that number of elements to the 
filter. It checks the false-positive-ratio by calling {{IFilter.isPresent}} 
using not-added keys.

It feels that the FPR increases linearly with the number of elements and 
exceeds the FPC for filters with more than 1,000,000 elements. Filters with 
100M elements have an FPR of .94, which is probably bad.

{code}
invertedHash=false fpc=0,010000 fpr=0,000107 for    Int32Type with      10000 
elements, bitset capacity=    100032, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,000931 for    Int32Type with     100000 
elements, bitset capacity=   1000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,009405 for    Int32Type with    1000000 
elements, bitset capacity=  10000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,093752 for    Int32Type with   10000000 
elements, bitset capacity= 100000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,942765 for    Int32Type with  100000000 
elements, bitset capacity=1000000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
{code}

Do I measure something wrong here?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to