[ https://issues.apache.org/jira/browse/CASSANDRA-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philip Thompson updated CASSANDRA-8589: --------------------------------------- Tester: (was: Ryan McGuire) > Reconciliation in presence of tombstone might yield state data > -------------------------------------------------------------- > > Key: CASSANDRA-8589 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8589 > Project: Cassandra > Issue Type: Bug > Reporter: Sylvain Lebresne > Fix For: 2.1.x, 2.2.x > > > Consider 3 replica A, B, C (so RF=3) and consider that we do the following > sequence of actions at {{QUORUM}} where I indicate the replicas acknowledging > each operation (and let's assume that a replica that don't ack is a replica > that don't get the update): > {noformat} > CREATE TABLE test (k text, t int, v int, PRIMARY KEY (k, t)) > INSERT INTO test(k, t, v) VALUES ('k', 0, 0); // acked by A, B and C > INSERT INTO test(k, t, v) VALUES ('k', 1, 1); // acked by A, B and C > INSERT INTO test(k, t, v) VALUES ('k', 2, 2); // acked by A, B and C > DELETE FROM test WHERE k='k' AND t=1; // acked by A and C > UPDATE test SET v = 3 WHERE k='k' AND t=2; // acked by B and C > SELECT * FROM test WHERE k='k' LIMIT 2; // answered by A and B > {noformat} > Every operation has achieved quorum, but on the last read, A will respond > {{0->0, tombstone 1, 2->2}} and B will respond {{0->0, 1->1}}. As a > consequence we'll answer {{0->0, 2->2}} which is incorrect (we should respond > {{0->0, 2->3}}). > Put another way, if we have a limit, every replica honors that limit but > since tombstones can "suppress" results from other nodes, we may have some > cells for which we actually don't get a quorum of response (even though we > globally have a quorum of replica responses). > In practice, this probably occurs rather rarely and so the "simpler" fix is > probably to do something similar to the "short reads protection": detect when > this could have happen (based on how replica response are reconciled) and do > an additional request in that case. That detection will have potential false > positives but I suspect we can be precise enough that those false positives > will be very very rare (we should nonetheless track how often this code gets > triggered and if we see that it's more often than we think, we could > pro-actively bump user limits internally to reduce those occurrences). -- This message was sent by Atlassian JIRA (v6.3.4#6332)