[
https://issues.apache.org/jira/browse/CASSANDRA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16006245#comment-16006245
]
Sergio Bossa commented on CASSANDRA-8272:
-----------------------------------------
[~adelapena], I gave a first review pass and the approach looks sensible, so +1
on that.
Unfortunately, the problem is actually quite subtle and there are at least a
couple cases where it doesn't fully work.
First of all, when a {{LIMIT}} clause is provided, the query might return no
results when there actually are some valid ones: this is because the rows
returned as a result of an "index mismatch" are still counted against the limit
(by {{CQLCounter}}), which means the coordinator might end up with less valid
rows than the requested limit, simply because some replicas returned only
mismatched rows. Here's a simple scenario with two nodes:
1) Write row {{key=1,index=1}}.
2) Write row {{key=2,index=1}}.
3) Shutdown node 2.
4) Delete column {{index}} from row {{key=1}}: the delete will go to node 1,
while node 2 will miss it.
5) Restart node 2 (hints need to be disabled).
6) Query for {{index=1}}.
7) Node 1 will return the first row found, i.e. the "mismatched" one {{key=1}}.
8) Node 2 will return the "missed delete" with {{key=1}}.
9) Coordinator will merge/post-process the rows, realize there's a mismatch and
return no results, while it should have instead returned {{key=2}}.
Second, this patch doesn't fix filtering; while it's true we have a different
issue for that ({{CASSANDRA-8273}}), and while we could argue filtering isn't
exactly a form of indexing, it is still used in conjunction with indexing, and
fixing indexing just to have its results invalidated when filtering is applied
seems quite confusing to me.
In the end, I'd suggest the following:
1) Stick with the current approach! It's good and I do not think using special
tombstones would buy us anything.
2) Fix the first problem above.
3) Generalize the approach so we can fix filtering and any other indexing
implementation (most notably SASI).
4) To ease the burden of porting between versions, and given this is not a
trivial bug fix at all, I'd also suggest to only apply it to 3.11 onwards.
Thoughts?
> 2ndary indexes can return stale data
> ------------------------------------
>
> Key: CASSANDRA-8272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8272
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sylvain Lebresne
> Assignee: Andrés de la Peña
> Fix For: 3.0.x
>
>
> When replica return 2ndary index results, it's possible for a single replica
> to return a stale result and that result will be sent back to the user,
> potentially failing the CL contract.
> For instance, consider 3 replicas A, B and C, and the following situation:
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v text);
> CREATE INDEX ON test(v);
> INSERT INTO test(k, v) VALUES (0, 'foo');
> {noformat}
> with every replica up to date. Now, suppose that the following queries are
> done at {{QUORUM}}:
> {noformat}
> UPDATE test SET v = 'bar' WHERE k = 0;
> SELECT * FROM test WHERE v = 'foo';
> {noformat}
> then, if A and B acknowledge the insert but C respond to the read before
> having applied the insert, then the now stale result will be returned (since
> C will return it and A or B will return nothing).
> A potential solution would be that when we read a tombstone in the index (and
> provided we make the index inherit the gcGrace of it's parent CF), instead of
> skipping that tombstone, we'd insert in the result a corresponding range
> tombstone.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]