Mike Schrag created CASSANDRA-7720:
--------------------------------------

             Summary: Add a more consistent snapshot mechanism
                 Key: CASSANDRA-7720
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7720
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Mike Schrag


We’ve hit an interesting issue with snapshotting, which makes sense in 
hindsight, but presents an interesting challenge for consistent restores:

* initiate snapshot
* snapshotting flushes table A and takes the snapshot
* insert into table A
* insert into table B
* snapshotting flushes table B and takes the snapshot
* snapshot finishes

So what happens here is that we end up having a B, but NOT having an A, even 
though B was chronologically inserted after A.

It makes sense when I think about what snapshot is doing, but I wonder if 
snapshots actually should get a little fancier to behave a little more like 
what I think most people would expect. What I think should happen is something 
along the lines of the following:

For each node:
* pass a client timestamp in the snapshot call corresponding to "now"
* snapshot the tables using the existing procedure
* walk backwards through the linked snapshot sstables in that snapshot
  * if the earliest update in that sstable is after the client's timestamp, 
delete the sstable in the snapshot
  * if the earliest update in the sstable is before the client's timestamp, 
then look at the last update. Walk backwards through that sstable.
    * if any updates fall after the timestamp, make a copy of that sstable in 
the snapshot folder only up to the point of the timestamp and then delete the 
original sstable in the snapshot (we need to copy because we're likely holding 
a shared hard linked sstable)

I think this would guarantee that you have a chronologically consistent view of 
your snapshot across all machines and columnfamilies within a given snapshot.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to