Redesigned Compaction
---------------------

                 Key: CASSANDRA-1608
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
    Affects Versions: 0.7 beta 2
            Reporter: Chris Goffinet


After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
thinking on this subject that I wanted to lay out.

I propose we redo the concept of how compaction works in Cassandra. At the 
moment, compaction is kicked off based on a write access pattern, not read 
access pattern. In most cases, you want the opposite. You want to be able to 
track how well each SSTable is performing in the system. If we were to keep 
statistics in-memory of each SSTable, prioritize them based on most accessed, 
and bloom filter hit/miss ratios, we could intelligently group sstables that 
are being read most often and schedule them for compaction. We could also 
schedule lower priority maintenance on SSTable's not often accessed.

I also propose we limit the size of each SSTable to a fix sized, that gives us 
the ability to  better utilize our bloom filters in a predictable manner. At 
the moment after a certain size, the bloom filters become less reliable. This 
would also allow us to group data most accessed. Currently the size of an 
SSTable can grow to a point where large portions of the data might not actually 
be accessed as often.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to