[ 
https://issues.apache.org/jira/browse/CASSANDRA-9150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-9150:
------------------------------------
    Description: 
As part of CASSANDRA-8413 I wrote a test that checks bloom filter 
false-positive-ratio and compares it to the target false-positive-chance.

[The 
test|https://github.com/snazy/cassandra/blob/8413-bffp/test/unit/org/apache/cassandra/utils/BloomFilterTest.java#L359]
 basically creates a BF using {{FilterFactory.getFilter}} with the number of 
elements to be added and the FPC. It then adds that number of elements to the 
filter. It checks the false-positive-ratio by calling {{IFilter.isPresent}} 
using not-added keys.

It feels that the FPR increases linearly with the number of elements and 
exceeds the FPC for filters with more than 1,000,000 elements. Filters with 
100M elements have an FPR of .94, which is probably bad.

{code}
invertedHash=false fpc=0,010000 fpr=0,000107 for    Int32Type with      10000 
elements, bitset capacity=    100032, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,000931 for    Int32Type with     100000 
elements, bitset capacity=   1000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,009405 for    Int32Type with    1000000 
elements, bitset capacity=  10000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,093752 for    Int32Type with   10000000 
elements, bitset capacity= 100000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,942765 for    Int32Type with  100000000 
elements, bitset capacity=1000000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
{code}

Do I measure something wrong here?

EDIT: it does not depend on the type or the FPC - it's the same pattern for 
other types and other FPCs.

  was:
As part of CASSANDRA-8413 I wrote a test that checks bloom filter 
false-positive-ratio and compares it to the target false-positive-chance.

[The 
test|https://github.com/snazy/cassandra/blob/8413-bffp/test/unit/org/apache/cassandra/utils/BloomFilterTest.java#L359]
 basically creates a BF using {{FilterFactory.getFilter}} with the number of 
elements to be added and the FPC. It then adds that number of elements to the 
filter. It checks the false-positive-ratio by calling {{IFilter.isPresent}} 
using not-added keys.

It feels that the FPR increases linearly with the number of elements and 
exceeds the FPC for filters with more than 1,000,000 elements. Filters with 
100M elements have an FPR of .94, which is probably bad.

{code}
invertedHash=false fpc=0,010000 fpr=0,000107 for    Int32Type with      10000 
elements, bitset capacity=    100032, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,000931 for    Int32Type with     100000 
elements, bitset capacity=   1000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,009405 for    Int32Type with    1000000 
elements, bitset capacity=  10000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,093752 for    Int32Type with   10000000 
elements, bitset capacity= 100000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
invertedHash=false fpc=0,010000 fpr=0,942765 for    Int32Type with  100000000 
elements, bitset capacity=1000000064, spec=BloomSpecification(K=5, 
bucketsPerElement=10)
{code}

Do I measure something wrong here?


> BloomFilter false-positive-ratio does not match false-positive-chance
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-9150
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9150
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Robert Stupp
>
> As part of CASSANDRA-8413 I wrote a test that checks bloom filter 
> false-positive-ratio and compares it to the target false-positive-chance.
> [The 
> test|https://github.com/snazy/cassandra/blob/8413-bffp/test/unit/org/apache/cassandra/utils/BloomFilterTest.java#L359]
>  basically creates a BF using {{FilterFactory.getFilter}} with the number of 
> elements to be added and the FPC. It then adds that number of elements to the 
> filter. It checks the false-positive-ratio by calling {{IFilter.isPresent}} 
> using not-added keys.
> It feels that the FPR increases linearly with the number of elements and 
> exceeds the FPC for filters with more than 1,000,000 elements. Filters with 
> 100M elements have an FPR of .94, which is probably bad.
> {code}
> invertedHash=false fpc=0,010000 fpr=0,000107 for    Int32Type with      10000 
> elements, bitset capacity=    100032, spec=BloomSpecification(K=5, 
> bucketsPerElement=10)
> invertedHash=false fpc=0,010000 fpr=0,000931 for    Int32Type with     100000 
> elements, bitset capacity=   1000064, spec=BloomSpecification(K=5, 
> bucketsPerElement=10)
> invertedHash=false fpc=0,010000 fpr=0,009405 for    Int32Type with    1000000 
> elements, bitset capacity=  10000064, spec=BloomSpecification(K=5, 
> bucketsPerElement=10)
> invertedHash=false fpc=0,010000 fpr=0,093752 for    Int32Type with   10000000 
> elements, bitset capacity= 100000064, spec=BloomSpecification(K=5, 
> bucketsPerElement=10)
> invertedHash=false fpc=0,010000 fpr=0,942765 for    Int32Type with  100000000 
> elements, bitset capacity=1000000064, spec=BloomSpecification(K=5, 
> bucketsPerElement=10)
> {code}
> Do I measure something wrong here?
> EDIT: it does not depend on the type or the FPC - it's the same pattern for 
> other types and other FPCs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to