Distinguish long and short running compactions
----------------------------------------------

                 Key: CASSANDRA-2559
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2559
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Sylvain Lebresne
            Priority: Minor


Unless you have SSD, multi-threaded compaction is mainly here to avoid 
accumulating lots of newly flushed sstables while a long lasting compaction is 
running. But too many concurrent compactions are bad for random IO. 
CASSANDRA-2558 will allow to limit the number of such concurrent compactions, 
but choosing the right number there is not easy. If you pick too low a number, 
you risk accumulating "young" sstables if 2 or 3 fairly long compaction runs at 
the same time. On the other side, compacting multiple "small" sstables is 
likely to be less efficient (on a spinning disk) than compacting them serially.

It seems to me we could have the best of both world by distinguishing long and 
short compactions. We could have 2 pools of thread, one for long compaction 
(whatever the exact definition is) and one for short ones. With this, even with 
one thread in each pool you would avoid most of the 'new sstable accumulation' 
problem while making sure you never run too many concurrent compactions (note 
that in theory we could stratify further than "short" and "long", but I'm not 
sure the benefits would out-weigh the added complexity).



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to