[ 
https://issues.apache.org/jira/browse/CASSANDRA-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784237#comment-17784237
 ] 

Caleb Rackliffe commented on CASSANDRA-19007:
---------------------------------------------

After conversations w/ [~mbyrd], [~aratnofsky], [~mike_tr_adamson] and others, 
some historical reading around CASSANDRA-7168, and some noodling, the following 
is my current best proposal for how we could possibly fix this, at least for 
SAI, without simply delegating consistency to Accord (which I think is still 
the right long-term approach)...


1.) When writing SAI column/SSTable indexes, write a special postings list 
containing the IDs for rows that simply don't have a value for the column. When 
the SSTable attached to the index is marked repaired, this list can be dropped.

2.) At query time, the coordinator would issue (possibly in parallel) two 
queries. The first would be against the repaired SSTables on a single replica. 
The second would be against the un-repaired SSTables on a QUORUM (or whatever > 
ONE CL is requested) of replicas. The first query could execute in more or less 
the exact same way SAI queries do now, w/ AND queries being executed as 
intersections, and post-filtering applied locally before returning results to 
the coordinator. The local index queries against the (QUORUM) un-repaired set 
would be different. They would need to a.) execute AND queries as unions to 
pick up matches on individual clauses and b.) integrate matches from the 
specialized postings list mentioned in #1 above. Also they would not apply 
post-filtering to the SAI row retrieval step, returning a raw row built from 
both repaired and un-repaired data to the coordinator.

3.) At the coordinator all row results from repaired and un-repaired sets would 
be combined and post-filtering applied. (In some cases receiving rows from both 
un-repaired set queries may avoid the need for replica filtering protection.)


ex. We have a two node cluster, and the row {{(a=1, b=2)}} exists in the 
repaired set for both.
 * Node A receives the partial update {{(b=3)}} but node B does not.
 * We issue a query at {{ALL/QUORUM}} for {{a=1 AND b=3}}.
 * The repaired set index query against node B returns nothing, as we would 
expect. (A would as well.)
 * The un-repaired set query against node B returns nothing, as no un-repaired 
data exists, and node A returns a match, since the single clause {{b=3}} hits 
the partial update A received. (A returns a full row that includes data from 
the repaired set as well.)
 * At the coordinator, we now have a complete row from a A {{(a=1, b=3)}}, but 
we still don't have a QUORUM. Happily, this is exactly the case 
replica-filtering protection protects us from, and it could be used to complete 
the QUORUM.
 * RFP would return a full row from node B {{(a=1, b=2)}}, which would merge w/ 
node A to remain {{(a=1, b=3)}} and pass post-filtering.

ex. Let's assume Node B had also received a partial update {{(a=2)}} in the 
example above.
 * We issue a query at {{ALL/QUORUM}} for {{a=1 AND b=3}}.
 * The repaired set index query against node B returns nothing, as we would 
expect. (A would as well.)
 * The un-repaired set query against node B returns a match against the special 
postings list as it contains no value for {{b}}, and node A returns a match, 
since the single clause {{b=3}} hits the partial update A received. (A and B 
return full rows that include data from the repaired set as well.)
 * At the coordinator, we now have a complete row from a A {{(a=1, b=3)}} and B 
{{(a=2, b=2)}}. We combine this QUORUM to get {{(a=2, b=3)}}, which correctly 
fails post-filtering and returns no result to the client. Notably, no RFP is 
required, since we get a QUORUM without it.

 ex. Finally let's say Node A received instead the partial update {{(b=4)}} in 
the previous example.
 * We issue a query at {{ALL/QUORUM}} for {{a=1 AND b=3}}.
 * The repaired set index query against node B returns nothing, as we would 
expect. (A would as well.)
 * The un-repaired set query against node B returns a match against the special 
postings list as it contains no value for {{b}}, and node A returns a match 
against the special postings list as it contains no value for {{a}}. (A and B 
would explicitly not match the query against the column values that DO exist in 
the un-repaired set.)
 * At the coordinator, we now have a complete row from a A {{(a=1, b=4)}} and B 
{{(a=2, b=2)}}. We combine this QUORUM to get {{(a=2, b=4)}}, which correctly 
fails post-filtering and returns no result to the client. Notably, no RFP is 
required, since we get a QUORUM without it.


It may also be possible to avoid the special "nullary" postings list and rely 
more heavily on RFP, but I think that will be a separate comment...

> Queries with multi-column replica-side filtering can miss rows
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-19007
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19007
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Andres de la Peña
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.0.x, 5.x
>
>
> {{SELECT}} queries with multi-column replica-side filtering can miss rows if 
> the filtered columns are spread across out-of-sync replicas. This dtest 
> reproduces the issue:
> {code:java}
> @Test
> public void testMultiColumnReplicaSideFiltering() throws IOException
> {
>     try (Cluster cluster = init(Cluster.build().withNodes(2).start()))
>     {
>         cluster.schemaChange(withKeyspace("CREATE TABLE %s.t (k int PRIMARY 
> KEY, a int, b int)"));
>         // insert a split row
>         cluster.get(1).executeInternal(withKeyspace("INSERT INTO %s.t(k, a) 
> VALUES (0, 1)"));
>         cluster.get(2).executeInternal(withKeyspace("INSERT INTO %s.t(k, b) 
> VALUES (0, 2)"));
>         String select = withKeyspace("SELECT * FROM %s.t WHERE a = 1 AND b = 
> 2 ALLOW FILTERING");
>         Object[][] initialRows = cluster.coordinator(1).execute(select, ALL);
>         assertRows(initialRows, row(0, 1, 2)); // not found!!
>     }
> }
> {code}
> This edge case affects queries using {{ALLOW FILTERING}} or any index 
> implementation.
> It affects all branches since multi-column replica-side filtering queries 
> were introduced, long before 3.0.
> The protection mechanism added by CASSANDRA-8272/8273 won't deal with this 
> case, since it only solves single-column conflicts where stale rows could 
> resurrect. This bug however doesn't resurrect data, it can only miss rows 
> while the replicas are out-of-sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to