[jira] [Commented] (CASSANDRA-9830) Option to disable bloom filter in highest level of LCS sstables

Paulo Motta (JIRA) Sat, 19 Mar 2016 13:38:40 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199816#comment-15199816
 ]


Paulo Motta commented on CASSANDRA-9830:
----------------------------------------

Thanks for the input [~krummas]. What I was trying to explain is that the 
savings in the incremental repair case are larger due to the top level having 
more sstables after intermingled incremental repairs in this specific test 
scenario, but I don't get exactly why.

Below are two simple scenarios on trunk + CASSANDRA-11370, showing that the 
final amount of sstables can be larger for the same dataset when run with or 
without intermingled repair:

* Scenario A - Incremental repair only after all data is inserted:
{noformat}
ccm create test -n 2
ccm start
ccm node1 stress "write n=100K cl=QUORUM -rate threads=300 -schema 
replication(factor=2) 
compaction(strategy=org.apache.cassandra.db.compaction.LeveledCompactionStrategy,sstable_size_in_mb=1)"
ccm flush
ccm node1 nodetool tablestats keyspace1.standard1

SSTables in each level (repaired): [0, 0, 0, 0, 0, 0, 0, 0, 0]
SSTables in each level (unrepaired): [0, 10, 12, 0, 0, 0, 0, 0, 0]
Space used (live): 24999219
Space used (total): 24999219

ccm node1 stress "write n=150K cl=QUORUM -rate threads=300 -schema 
replication(factor=2) 
compaction(strategy=org.apache.cassandra.db.compaction.LeveledCompactionStrategy,sstable_size_in_mb=1)"
ccm flush
ccm node1 nodetool tablestats keyspace1.standard1

SSTable count: 42
SSTables in each level (repaired): [0, 0, 0, 0, 0, 0, 0, 0, 0]
SSTables in each level (unrepaired): [0, 10, 32, 0, 0, 0, 0, 0, 0]
Space used (live): 42634455
Space used (total): 42634455

ccm node1 nodetool repair keyspace1 standard1

SSTable count: 42
SSTables in each level (repaired): [0, 10, 32, 0, 0, 0, 0, 0, 0]
SSTables in each level (unrepaired): [0, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 42634455
Space used (total): 42634455
{noformat}

* Scenario B - Incremental repair after part of data is inserted and after all 
data is inserted:
{noformat}
ccm create test -n 2
ccm start
ccm node1 stress "write n=100K cl=QUORUM -rate threads=300 -schema 
replication(factor=2) 
compaction(strategy=org.apache.cassandra.db.compaction.LeveledCompactionStrategy,sstable_size_in_mb=1)"
ccm flush
ccm node1 nodetool tablestats keyspace1.standard1

SSTable count: 22
SSTables in each level (repaired): [0, 0, 0, 0, 0, 0, 0, 0, 0]
SSTables in each level (unrepaired): [0, 10, 12, 0, 0, 0, 0, 0, 0]
Space used (live): 25017410
Space used (total): 25017410

ccm node1 nodetool repair keyspace1 standard1

SSTable count: 22
SSTables in each level (repaired): [0, 10, 12, 0, 0, 0, 0, 0, 0]
SSTables in each level (unrepaired): [0, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 25017410
Space used (total): 25017410

ccm node1 stress "write n=150K cl=QUORUM -rate threads=300 -schema 
replication(factor=2) 
compaction(strategy=org.apache.cassandra.db.compaction.LeveledCompactionStrategy,sstable_size_in_mb=1)"
ccm flush
ccm node1 nodetool tablestats keyspace1.standard1

Table: standard1
SSTable count: 56
SSTables in each level (repaired): [0, 10, 12, 0, 0, 0, 0, 0, 0]
SSTables in each level (unrepaired): [0, 10, 24, 0, 0, 0, 0, 0, 0]
Space used (live): 62554649
Space used (total): 62554649

ccm node1 nodetool repair keyspace1 standard1

SSTable count: 50
SSTables in each level (repaired): [0, 10, 40, 0, 0, 0, 0, 0, 0]
SSTables in each level (unrepaired): [0, 0, 0, 0, 0, 0, 0, 0, 0]
Space used (live): 46033951
Space used (total): 46033951
{noformat}

Would you know why in Scenario B, the final amount of sstables is larger than 
Scenario A for the same dataset? Is this a bug or expected behavior?

> Option to disable bloom filter in highest level of LCS sstables
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-9830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9830
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Compaction
>            Reporter: Jonathan Ellis
>            Assignee: Paulo Motta
>            Priority: Minor
>              Labels: performance
>             Fix For: 3.x
>
>
> We expect about 90% of data to be in the highest level of LCS in a fully 
> populated series.  (See also CASSANDRA-9829.)
> Thus if the user is primarily asking for data (partitions) that has actually 
> been inserted, the bloom filter on the highest level only helps reject 
> sstables about 10% of the time.
> We should add an option that suppresses bloom filter creation on top-level 
> sstables.  This will dramatically reduce memory usage for LCS and may even 
> improve performance as we no longer check a low-value filter.
> (This is also an idea from RocksDB.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9830) Option to disable bloom filter in highest level of LCS sstables

Reply via email to