[ 
https://issues.apache.org/jira/browse/CASSANDRA-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-8589:
---------------------------------------
    Tester:   (was: Ryan McGuire)

> Reconciliation in presence of tombstone might yield state data
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-8589
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8589
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>             Fix For: 2.1.x, 2.2.x
>
>
> Consider 3 replica A, B, C (so RF=3) and consider that we do the following 
> sequence of actions at {{QUORUM}} where I indicate the replicas acknowledging 
> each operation (and let's assume that a replica that don't ack is a replica 
> that don't get the update):
> {noformat}
> CREATE TABLE test (k text, t int, v int, PRIMARY KEY (k, t))
> INSERT INTO test(k, t, v) VALUES ('k', 0, 0); // acked by A, B and C
> INSERT INTO test(k, t, v) VALUES ('k', 1, 1); // acked by A, B and C
> INSERT INTO test(k, t, v) VALUES ('k', 2, 2); // acked by A, B and C
> DELETE FROM test WHERE k='k' AND t=1;         // acked by A and C
> UPDATE test SET v = 3 WHERE k='k' AND t=2;    // acked by B and C
> SELECT * FROM test WHERE k='k' LIMIT 2;       // answered by A and B
> {noformat}
> Every operation has achieved quorum, but on the last read, A will respond 
> {{0->0, tombstone 1, 2->2}} and B will respond {{0->0, 1->1}}. As a 
> consequence we'll answer {{0->0, 2->2}} which is incorrect (we should respond 
> {{0->0, 2->3}}).
> Put another way, if we have a limit, every replica honors that limit but 
> since tombstones can "suppress" results from other nodes, we may have some 
> cells for which we actually don't get a quorum of response (even though we 
> globally have a quorum of replica responses).
> In practice, this probably occurs rather rarely and so the "simpler" fix is 
> probably to do something similar to the "short reads protection": detect when 
> this could have happen (based on how replica response are reconciled) and do 
> an additional request in that case. That detection will have potential false 
> positives but I suspect we can be precise enough that those false positives 
> will be very very rare (we should nonetheless track how often this code gets 
> triggered and if we see that it's more often than we think, we could 
> pro-actively bump user limits internally to reduce those occurrences).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to