[
https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15197470#comment-15197470
]
Paulo Motta commented on CASSANDRA-9830:
----------------------------------------
Ok, it seems after CASSANDRA-11344 we now have consistent and predictable
results:
* Scenario A: organic compactions: bloom_filter_fp_chance = 0.1 vs lower
bloom_filter_fp_chance = 0.01
** Analysis: Savings are consistent with different bfpc values. Takeaway is
that you can increase bfpc while keeping the same memory footprint.
||[organic1a|http://cstar.datastax.com/tests/id/3c02e674-eab2-11e5-ac91-0256e416528f]||trunk||patched||savings||
||[organic1b (lower
bloom_filter_fp_chance)|http://cstar.datastax.com/tests/id/3c67130e-eaff-11e5-b22b-0256e416528f]||trunk||patched||savings||
|node1|11684936|4772280|59.16%| |node1|23910064|9595248|59.87%|
|node2|11704648|4791896|59.06%| |node1|23412280|9595000|59.02%|
|node3|11954248|4792088|59.91%| |node1|23408696|9589704|59.03%|
* Scenario B: major compactions: bloom_filter_fp_chance = 0.1 vs lower
bloom_filter_fp_chance = 0.01
** Analysis: Savings are consistent with different bfpc values. Savings are
slightly lower probably due to difference in how bloom filters are allocated in
major compactions, but probably not something to worry about.
||[major1a|http://cstar.datastax.com/tests/id/5661a302-eab2-11e5-ac91-0256e416528f]||trunk||patched||savings||
||[major1b (lower
bloom_filter_fp_chance)|http://cstar.datastax.com/tests/id/39f17b6e-eaff-11e5-b22b-0256e416528f]||trunk||patched||savings||
|node1|8026368|3818000|52.43%| |node1|16035264|7644000|52.33%|
|node2|8026368|3822080|52.38%| |node1|16052400|7644000|52.38%|
|node3|8026368|3822080|52.38%| |node1|16052400|7644000|52.38%|
* Scenario C: incremental repairs
** Analysis: Savings are still consistent with incremental repair. The savings
are higher probably due to sstables in the top level being moved from
unrepaired to repaired in the highest level after anticompaction, so there's a
higher number of sstables in the top level, thus higher savings.
||[repair1a|http://cstar.datastax.com/tests/id/9501e088-ea33-11e5-847f-0256e416528f]||trunk||patched||savings||
|node1|12234296|4112240|66.39%|
|node2|12695872|4187680|67.02%|
|node3|12694680|4183600|67.04%|
Rebased above branch without conflicts.
> Option to disable bloom filter in highest level of LCS sstables
> ---------------------------------------------------------------
>
> Key: CASSANDRA-9830
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9830
> Project: Cassandra
> Issue Type: New Feature
> Components: Compaction
> Reporter: Jonathan Ellis
> Assignee: Paulo Motta
> Priority: Minor
> Labels: performance
> Fix For: 3.x
>
>
> We expect about 90% of data to be in the highest level of LCS in a fully
> populated series. (See also CASSANDRA-9829.)
> Thus if the user is primarily asking for data (partitions) that has actually
> been inserted, the bloom filter on the highest level only helps reject
> sstables about 10% of the time.
> We should add an option that suppresses bloom filter creation on top-level
> sstables. This will dramatically reduce memory usage for LCS and may even
> improve performance as we no longer check a low-value filter.
> (This is also an idea from RocksDB.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)