[ 
https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197470#comment-15197470
 ] 

Paulo Motta commented on CASSANDRA-9830:
----------------------------------------

Ok, it seems after CASSANDRA-11344 we now have consistent and predictable 
results:

* Scenario A: organic compactions: bloom_filter_fp_chance = 0.1 vs lower 
bloom_filter_fp_chance = 0.01
** Analysis: Savings are consistent with different bfpc values. Takeaway is 
that you can increase bfpc while keeping the same memory footprint.

||[organic1a|http://cstar.datastax.com/tests/id/3c02e674-eab2-11e5-ac91-0256e416528f]||trunk||patched||savings||
 ||[organic1b (lower 
bloom_filter_fp_chance)|http://cstar.datastax.com/tests/id/3c67130e-eaff-11e5-b22b-0256e416528f]||trunk||patched||savings||
|node1|11684936|4772280|59.16%| |node1|23910064|9595248|59.87%|
|node2|11704648|4791896|59.06%| |node1|23412280|9595000|59.02%|
|node3|11954248|4792088|59.91%| |node1|23408696|9589704|59.03%|

* Scenario B: major compactions: bloom_filter_fp_chance = 0.1 vs lower 
bloom_filter_fp_chance = 0.01
** Analysis: Savings are consistent with different bfpc values. Savings are 
slightly lower probably due to difference in how bloom filters are allocated in 
major compactions, but probably not something to worry about.

||[major1a|http://cstar.datastax.com/tests/id/5661a302-eab2-11e5-ac91-0256e416528f]||trunk||patched||savings||
 ||[major1b (lower 
bloom_filter_fp_chance)|http://cstar.datastax.com/tests/id/39f17b6e-eaff-11e5-b22b-0256e416528f]||trunk||patched||savings||
|node1|8026368|3818000|52.43%| |node1|16035264|7644000|52.33%|
|node2|8026368|3822080|52.38%| |node1|16052400|7644000|52.38%|
|node3|8026368|3822080|52.38%| |node1|16052400|7644000|52.38%|

* Scenario C: incremental repairs
** Analysis: Savings are still consistent with incremental repair. The savings 
are higher probably due to sstables in the top level being moved from 
unrepaired to repaired in the highest level after anticompaction, so there's a 
higher number of sstables in the top level, thus higher savings.

||[repair1a|http://cstar.datastax.com/tests/id/9501e088-ea33-11e5-847f-0256e416528f]||trunk||patched||savings||
|node1|12234296|4112240|66.39%|
|node2|12695872|4187680|67.02%|
|node3|12694680|4183600|67.04%|

Rebased above branch without conflicts.

> Option to disable bloom filter in highest level of LCS sstables
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-9830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Jonathan Ellis
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.x
>
>
> We expect about 90% of data to be in the highest level of LCS in a fully 
> populated series.  (See also CASSANDRA-9829.)
> Thus if the user is primarily asking for data (partitions) that has actually 
> been inserted, the bloom filter on the highest level only helps reject 
> sstables about 10% of the time.
> We should add an option that suppresses bloom filter creation on top-level 
> sstables.  This will dramatically reduce memory usage for LCS and may even 
> improve performance as we no longer check a low-value filter.
> (This is also an idea from RocksDB.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to