sankalp kohli created CASSANDRA-14145:
-----------------------------------------
Summary: Detecting data resurrection during read
Key: CASSANDRA-14145
URL: https://issues.apache.org/jira/browse/CASSANDRA-14145
Project: Cassandra
Issue Type: Improvement
Reporter: sankalp kohli
Priority: Minor
We have seen several bugs in which deleted data gets resurrected. We should try
to see if we can detect this on the read path and possibly fix it. Here are a
few examples which brought back data
A replica lost an sstable on startup which caused one replica to lose the
tombstone and not the data. This tombstone was past gc grace which means this
could resurrect data. We can deduct such invalid states by looking at other
replicas.
If we are running incremental repair, Cassandra will keep repaired and
non-repaired data separate. Every-time incremental repair will run, it will
move the data from non-repaired to repaired. Repaired data across all replicas
should be 100% consistent.
Here is an example of how we can detect and mitigate the issue in most cases.
Say we have 3 machines, A,B and C. All these machines will have data split b/w
repaired and non-repaired.
1. Machine A due to some bug bring backs data D. This data D is in repaired
dataset. All other replicas will have data D and tombstone T
2. Read for data D comes from application which involve replicas A and B. The
data being read involves data which is in repaired state. A will respond back
to co-ordinator with data D and B will send nothing as tombstone is past gc
grace. This will cause digest mismatch.
3. This patch will only kick in when there is a digest mismatch. Co-ordinator
will ask both replicas to send back all data like we do today but with this
patch, replicas will respond back what data it is returning is coming from
repaired vs non-repaired. If data coming from repaired does not match, we know
there is a something wrong!! At this time, co-ordinator cannot determine if
replica A has resurrected some data or replica B has lost some data. We can
still log error in the logs saying we hit an invalid state.
4. Besides the log, we can take this further and even correct the response to
the query. After logging an invalid state, we can ask replica A and B (and also
C if alive) to send back all data for this including gcable tombstones. If any
machine returns a tombstone which is after this data, we know we cannot return
this data. This way we can avoid returning data which has been deleted.
Some Challenges with this
1. When data will be moved from non-repaired to repaired, there could be a race
here. We can look at which incremental repairs have promoted things on which
replica to avoid false positives.
2. If the third replica is down and live replica does not have any tombstone,
we wont be able to break the tie in deciding whether data was actually deleted
or resurrected.
3. If the read is for latest data only, we wont be able to detect it as the
read will be served from non-repaired data.
4. If the replica where we lose a tombstone is the last replica to compact the
tombstone, we wont be able to decide if data is coming back or rest of the
replicas has lost that data. But we will still detect something is wrong.
5. We wont affect 99.9% of the read queries as we only do extra work during
digest mismatch.
6. CL.ONE reads will not be able to detect this.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]