[ 
https://issues.apache.org/jira/browse/CASSANDRA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019835#comment-16019835
 ] 

Andrés de la Peña commented on CASSANDRA-8272:
----------------------------------------------

It seems that for guaranteeing consistency the replicas should send *all* the 
rows with a matching obsolete entry in the index. The number of rows to be sent 
can be quite large in large clusters, so this could have an appreciable impact 
in performance. 

However, this mechanism is not required when read consistency level is ONE, 
which I suspect is the most commonly used consistency level for 2i use cases. 
So, as a performance improvement, we could keep filtering on the replica side 
when CL=ONE. 

I think we could pass the consistency level (or a boolean indicating if CL=ONE) 
to the {{Index#searcherFor}} method, in such a way that the index 
implementation would be expected to provide only not-stale results if CL=ONE, 
and either stale or not-stale results if CL>ONE. It could be cleaner to keep 
this logic out of the index implementations, but they are in the best position 
to efficiently apply replica-side filtering at CL=ONE, because they can 
possibly use their underlying index structures to simply don't read the stale 
results instead of just skipping them.

WDYT? Does it make any sense?

> 2ndary indexes can return stale data
> ------------------------------------
>
>                 Key: CASSANDRA-8272
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8272
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Andrés de la Peña
>             Fix For: 3.0.x
>
>
> When replica return 2ndary index results, it's possible for a single replica 
> to return a stale result and that result will be sent back to the user, 
> potentially failing the CL contract.
> For instance, consider 3 replicas A, B and C, and the following situation:
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v text);
> CREATE INDEX ON test(v);
> INSERT INTO test(k, v) VALUES (0, 'foo');
> {noformat}
> with every replica up to date. Now, suppose that the following queries are 
> done at {{QUORUM}}:
> {noformat}
> UPDATE test SET v = 'bar' WHERE k = 0;
> SELECT * FROM test WHERE v = 'foo';
> {noformat}
> then, if A and B acknowledge the insert but C respond to the read before 
> having applied the insert, then the now stale result will be returned (since 
> C will return it and A or B will return nothing).
> A potential solution would be that when we read a tombstone in the index (and 
> provided we make the index inherit the gcGrace of it's parent CF), instead of 
> skipping that tombstone, we'd insert in the result a corresponding range 
> tombstone.  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to