[ 
https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089891#comment-15089891
 ] 

Paulo Motta commented on CASSANDRA-9830:
----------------------------------------

I managed to extract some cfstats metrics from the [stats 
json|http://cstar.datastax.com/tests/artifacts/7ebab860-b48f-11e5-9d2a-0256e416528f/stats/stats.7ebab860-b48f-11e5-9d2a-0256e416528f.json].
 Below are some observations:
* The bloom filter false positive ratio is always higher in the branch with 
{{skip_top_level_bloom_filter}}, this is expected as there are some 
non-existent reads in the test case, but the actual use case for this is when 
reads are known to be present, so it actually doesn't make much sense to test 
non-existent reads.
* What surprises me a bit is that the memory usage of the bloom filter is not 
always lower with the {{skip_top_level_bloom_filter}} option, as can be seen in 
the metrics for the {{blade-11-2a}} node. I suspect this might be due to the 
major compaction step, which is not skipping top level bloom filters in the 
current implementation. Could you trigger another run without the major 
compaction step so we can see if this will hold [~carlyeks]? Do you have any 
other explanation for this? Thanks!

* blade-11-2a
** trunk {noformat}
        SSTable count: 25
        SSTables in each level: [0, 10, 15, 0, 0, 0, 0, 0, 0]
        Space used (live): 4273599852
        Space used (total): 4273599852
        Bloom filter false positives: 8
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 61442184
        Bloom filter off heap memory used: 61441984
{noformat}
** skip_top_level_bloom_filter {noformat}
        SSTable count: 26
        SSTables in each level: [0, 10, 16, 0, 0, 0, 0, 0, 0]
        Space used (live): 4269482588
        Space used (total): 4269482588
        Bloom filter false positives: 272
        Bloom filter false ratio: 0.00001
        Bloom filter space used: 92524640
        Bloom filter off heap memory used: 92524560
{noformat}


* blade-11-3a
** trunk {noformat}
        SSTable count: 26
        SSTables in each level: [0, 10, 16, 0, 0, 0, 0, 0, 0]
        Space used (live): 4318124528
        Space used (total): 4318124528
        Bloom filter false positives: 17
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 69421560
        Bloom filter off heap memory used: 69421352
{noformat}
** skip_top_level_bloom_filter {noformat}
        SSTable count: 25
        SSTables in each level: [0, 10, 15, 0, 0, 0, 0, 0, 0]
        Space used (live): 4195812995
        Space used (total): 4195812995
        Bloom filter false positives: 364
        Bloom filter false ratio: 0.00001
        Bloom filter space used: 56484240
        Bloom filter off heap memory used: 56484160
{noformat}


* blade-11-4a
** trunk {noformat}
        SSTable count: 25
        SSTables in each level: [0, 10, 15, 0, 0, 0, 0, 0, 0]
        Space used (live): 4269592570
        Space used (total): 4269592570
        Bloom filter false positives: 9
        Bloom filter false ratio: 0.00000
        Bloom filter space used: 61316032
        Bloom filter off heap memory used: 61315832
{noformat}
** skip_top_level_bloom_filter {noformat}
        SSTable count: 25
        SSTables in each level: [0, 10, 15, 0, 0, 0, 0, 0, 0]
        Space used (live): 4195876894
        Space used (total): 4195876894
        Bloom filter false positives: 543
        Bloom filter false ratio: 0.00002
        Bloom filter space used: 56474560
        Bloom filter off heap memory used: 56474480
{noformat}

> Option to disable bloom filter in highest level of LCS sstables
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-9830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Jonathan Ellis
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.x
>
>
> We expect about 90% of data to be in the highest level of LCS in a fully 
> populated series.  (See also CASSANDRA-9829.)
> Thus if the user is primarily asking for data (partitions) that has actually 
> been inserted, the bloom filter on the highest level only helps reject 
> sstables about 10% of the time.
> We should add an option that suppresses bloom filter creation on top-level 
> sstables.  This will dramatically reduce memory usage for LCS and may even 
> improve performance as we no longer check a low-value filter.
> (This is also an idea from RocksDB.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to