support incremental sstable switching
-------------------------------------

                 Key: CASSANDRA-1658
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1658
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Peter Schuller
            Priority: Minor


I have been thinking about how to minimize the impact of compaction further 
beyond CASSANDRA-1470. 1470 deals with the impact of the compaction process 
itself in that it avoids going through the buffer cache; however, once 
compaction is complete you are still switching to new sstables which will imply 
cold reads.

Instead of switching all at once, one could keep both the old and new sstables 
around for a bit and incrementally switch over traffic to the new sstables.

A given request would go to the new or old sstable depending on e.g. the hash 
of the row key couple with the point in time relative to compaction completion 
and relative to the intended target sstable switch-over.

In terms of end-user configuration/mnemonics, one would specify, for a given 
column family, something like "sstable transition period per gb of data" or 
similar. The "per gb of data" would refer to the size of the newly written 
sstable after a compaction. So; for a major compaction you would wait for a 
very significant period of time since the entire database just went cold. For a 
minor compaction, you would only wait for a short period of time.

The result should be a reasonable negative impact on e.g. disk space usage, but 
hopefully a very significant impact in terms of making the sstable transition 
as smooth as possible for the node.

I like this because it feels pretty simple, is not relying on OS specific 
features or otherwise rely on specific support from the OS other than a "well 
functioning cache mechanism", and does not imply something hugely significant 
like writing our own page cache layer. The performance w.r.t. CPU should be 
very small, but the improvement in terms of disk I/O should be very significant 
for workloads where it matters.

The feature would be optional and per-sstable (or possibly global for the node).



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to