[
https://issues.apache.org/jira/browse/CASSANDRA-14145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated CASSANDRA-14145:
---------------------------------------
Labels: pull-request-available (was: )
> Detecting data resurrection during read
> ----------------------------------------
>
> Key: CASSANDRA-14145
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14145
> Project: Cassandra
> Issue Type: Improvement
> Reporter: sankalp kohli
> Assignee: Sam Tunnicliffe
> Priority: Minor
> Labels: pull-request-available
> Fix For: 4.0
>
>
> We have seen several bugs in which deleted data gets resurrected. We should
> try to see if we can detect this on the read path and possibly fix it. Here
> are a few examples which brought back data
> A replica lost an sstable on startup which caused one replica to lose the
> tombstone and not the data. This tombstone was past gc grace which means this
> could resurrect data. We can detect such invalid states by looking at other
> replicas.
> If we are running incremental repair, Cassandra will keep repaired and
> non-repaired data separate. Every-time incremental repair will run, it will
> move the data from non-repaired to repaired. Repaired data across all
> replicas should be 100% consistent.
> Here is an example of how we can detect and mitigate the issue in most cases.
> Say we have 3 machines, A,B and C. All these machines will have data split
> b/w repaired and non-repaired.
> 1. Machine A due to some bug bring backs data D. This data D is in repaired
> dataset. All other replicas will have data D and tombstone T
> 2. Read for data D comes from application which involve replicas A and B. The
> data being read involves data which is in repaired state. A will respond
> back to co-ordinator with data D and B will send nothing as tombstone is past
> gc grace. This will cause digest mismatch.
> 3. This patch will only kick in when there is a digest mismatch. Co-ordinator
> will ask both replicas to send back all data like we do today but with this
> patch, replicas will respond back what data it is returning is coming from
> repaired vs non-repaired. If data coming from repaired does not match, we
> know there is a something wrong!! At this time, co-ordinator cannot determine
> if replica A has resurrected some data or replica B has lost some data. We
> can still log error in the logs saying we hit an invalid state.
> 4. Besides the log, we can take this further and even correct the response to
> the query. After logging an invalid state, we can ask replica A and B (and
> also C if alive) to send back all data for this including gcable tombstones.
> If any machine returns a tombstone which is after this data, we know we
> cannot return this data. This way we can avoid returning data which has been
> deleted.
> Some Challenges with this
> 1. When data will be moved from non-repaired to repaired, there could be a
> race here. We can look at which incremental repairs have promoted things on
> which replica to avoid false positives.
> 2. If the third replica is down and live replica does not have any tombstone,
> we wont be able to break the tie in deciding whether data was actually
> deleted or resurrected.
> 3. If the read is for latest data only, we wont be able to detect it as the
> read will be served from non-repaired data.
> 4. If the replica where we lose a tombstone is the last replica to compact
> the tombstone, we wont be able to decide if data is coming back or rest of
> the replicas has lost that data. But we will still detect something is wrong.
> 5. We wont affect 99.9% of the read queries as we only do extra work during
> digest mismatch.
> 6. CL.ONE reads will not be able to detect this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]