[
https://issues.apache.org/jira/browse/CASSANDRA-7720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14090251#comment-14090251
]
Mike Schrag commented on CASSANDRA-7720:
----------------------------------------
I agree about not having any guarantees on ordering. And for our running
system, this isn't a big deal, because it will be eventually correct.
Snapshotting is an interesting problem, though, because you potentially
preserve a view of the world that you can never recover from in your backups.
With what I'm proposing, if you snapshot an entire cluster and then restore it
onto a brand new cluster, you at least get a cluster-wide consistent view of
the universe at time 't'. In the current system, you can get unlucky and manage
to literally never get an A written to disk (we had this happen). With the
consistent time-t snapshot, you'd be globally consistent in your backup up to
any given point, so you might get an A without a B, but you'd never get a B
without an A. The backup-and-restore case is really nasty because it's
conceptually like an infinite-duration network partition, so if you don't try
your best to get a good view of the world, there's no eventual consistency that
is ever going to fix you up.
> Add a more consistent snapshot mechanism
> ----------------------------------------
>
> Key: CASSANDRA-7720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7720
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Mike Schrag
>
> We’ve hit an interesting issue with snapshotting, which makes sense in
> hindsight, but presents an interesting challenge for consistent restores:
> * initiate snapshot
> * snapshotting flushes table A and takes the snapshot
> * insert into table A
> * insert into table B
> * snapshotting flushes table B and takes the snapshot
> * snapshot finishes
> So what happens here is that we end up having a B, but NOT having an A, even
> though B was chronologically inserted after A.
> It makes sense when I think about what snapshot is doing, but I wonder if
> snapshots actually should get a little fancier to behave a little more like
> what I think most people would expect. What I think should happen is
> something along the lines of the following:
> For each node:
> * pass a client timestamp in the snapshot call corresponding to "now"
> * snapshot the tables using the existing procedure
> * walk backwards through the linked snapshot sstables in that snapshot
> * if the earliest update in that sstable is after the client's timestamp,
> delete the sstable in the snapshot
> * if the earliest update in the sstable is before the client's timestamp,
> then look at the last update. Walk backwards through that sstable.
> * if any updates fall after the timestamp, make a copy of that sstable in
> the snapshot folder only up to the point of the timestamp and then delete the
> original sstable in the snapshot (we need to copy because we're likely
> holding a shared hard linked sstable)
> I think this would guarantee that you have a chronologically consistent view
> of your snapshot across all machines and columnfamilies within a given
> snapshot.
--
This message was sent by Atlassian JIRA
(v6.2#6252)