[ 
https://issues.apache.org/jira/browse/CASSANDRA-9167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703021#comment-16703021
 ] 

Benedict commented on CASSANDRA-9167:
-------------------------------------

I agree.  I just wanted to cancel 'Patch Available' here so it's out of the 
JIRA query I use for my To Do list :)

There's a related ticket CASSANDRA-6633, where I also propose separating the 
address space into differing configurations of hash functions, some 
superfluous, so that FP ratio and memory utilisation can be configured 
dynamically by loading/unloading different combinations from disk.  In this 
case, having some regions that are less memory efficient but faster, and some 
that are more memory efficient but require more indirect memory accesses, would 
permit you to make this decision at run time based on the available memory on 
the system.

> Improve bloom-filter false-positive-ratio
> -----------------------------------------
>
>                 Key: CASSANDRA-9167
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9167
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Robert Stupp
>            Assignee: Robert Stupp
>            Priority: Minor
>              Labels: perfomance
>
> {{org.apache.cassandra.utils.BloomCalculations}} performs some table lookups 
> to calculate the bloom filter specification (size, # of hashes). Using the 
> exact maths for that computation brings a better false-positive-ratio (the 
> maths usually returns higher numbers for hash-counts).
> TL;DR increasing the number of hash-rounds brings a nice improvement. Finally 
> it's a trade-off between CPU and I/O.
> ||false-positive-chance||elements||capacity||hash count 
> new||false-positive-ratio new||hash count current||false-positive-ratio 
> current||improvement
> |0.1|10000|50048|3|0.0848|3|0.0848|0
> |0.1|100000|500032|3|0.09203|3|0.09203|0
> |0.1|1000000|5000064|3|0.0919|3|0.0919|0
> |0.1|10000000|50000064|3|0.09182|3|0.09182|0
> |0.1|100000000|500000064|3|0.091874|3|0.091874|0
> |0.01|10000|100032|7|0.0092|5|0.0107|0.1630434783
> |0.01|100000|1000064|7|0.00818|5|0.00931|0.1381418093
> |0.01|1000000|10000064|7|0.008072|5|0.009405|0.1651387512
> |0.01|10000000|100000064|7|0.008174|5|0.009375|0.146929288
> |0.01|100000000|1000000064|7|0.008197|5|0.009428|0.150176894
> |0.001|10000|150080|10|0.0008|7|0.001|0.25
> |0.001|100000|1500032|10|0.0006|7|0.00094|0.5666666667
> |0.001|1000000|15000064|10|0.000717|7|0.000991|0.3821478382
> |0.001|10000000|150000064|10|0.000743|7|0.000992|0.33512786
> |0.001|100000000|1500000064|10|0.000741|7|0.001002|0.3522267206
> |0.0001|10000|200064|13|0|10|0.0002|#DIV/0!
> |0.0001|100000|2000064|13|0.00004|10|0.0001|1.5
> |0.0001|1000000|20000064|13|0.000075|10|0.000091|0.2133333333
> |0.0001|10000000|200000064|13|0.000069|10|0.000087|0.2608695652
> |0.0001|100000000|2000000064|13|0.000068|10|0.00009|0.3235294118
> If we decide to allow more hash-rounds, it could be nicely back-ported even 
> to 2.0 without affecting existing sstables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to