[
https://issues.apache.org/jira/browse/CASSANDRA-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784237#comment-17784237
]
Caleb Rackliffe commented on CASSANDRA-19007:
---------------------------------------------
After conversations w/ [~mbyrd], [~aratnofsky], [~mike_tr_adamson] and others,
some historical reading around CASSANDRA-7168, and some noodling, the following
is my current best proposal for how we could possibly fix this, at least for
SAI, without simply delegating consistency to Accord (which I think is still
the right long-term approach)...
1.) When writing SAI column/SSTable indexes, write a special postings list
containing the IDs for rows that simply don't have a value for the column. When
the SSTable attached to the index is marked repaired, this list can be dropped.
2.) At query time, the coordinator would issue (possibly in parallel) two
queries. The first would be against the repaired SSTables on a single replica.
The second would be against the un-repaired SSTables on a QUORUM (or whatever >
ONE CL is requested) of replicas. The first query could execute in more or less
the exact same way SAI queries do now, w/ AND queries being executed as
intersections, and post-filtering applied locally before returning results to
the coordinator. The local index queries against the (QUORUM) un-repaired set
would be different. They would need to a.) execute AND queries as unions to
pick up matches on individual clauses and b.) integrate matches from the
specialized postings list mentioned in #1 above. Also they would not apply
post-filtering to the SAI row retrieval step, returning a raw row built from
both repaired and un-repaired data to the coordinator.
3.) At the coordinator all row results from repaired and un-repaired sets would
be combined and post-filtering applied. (In some cases receiving rows from both
un-repaired set queries may avoid the need for replica filtering protection.)
ex. We have a two node cluster, and the row {{(a=1, b=2)}} exists in the
repaired set for both.
* Node A receives the partial update {{(b=3)}} but node B does not.
* We issue a query at {{ALL/QUORUM}} for {{a=1 AND b=3}}.
* The repaired set index query against node B returns nothing, as we would
expect. (A would as well.)
* The un-repaired set query against node B returns nothing, as no un-repaired
data exists, and node A returns a match, since the single clause {{b=3}} hits
the partial update A received. (A returns a full row that includes data from
the repaired set as well.)
* At the coordinator, we now have a complete row from a A {{(a=1, b=3)}}, but
we still don't have a QUORUM. Happily, this is exactly the case
replica-filtering protection protects us from, and it could be used to complete
the QUORUM.
* RFP would return a full row from node B {{(a=1, b=2)}}, which would merge w/
node A to remain {{(a=1, b=3)}} and pass post-filtering.
ex. Let's assume Node B had also received a partial update {{(a=2)}} in the
example above.
* We issue a query at {{ALL/QUORUM}} for {{a=1 AND b=3}}.
* The repaired set index query against node B returns nothing, as we would
expect. (A would as well.)
* The un-repaired set query against node B returns a match against the special
postings list as it contains no value for {{b}}, and node A returns a match,
since the single clause {{b=3}} hits the partial update A received. (A and B
return full rows that include data from the repaired set as well.)
* At the coordinator, we now have a complete row from a A {{(a=1, b=3)}} and B
{{(a=2, b=2)}}. We combine this QUORUM to get {{(a=2, b=3)}}, which correctly
fails post-filtering and returns no result to the client. Notably, no RFP is
required, since we get a QUORUM without it.
ex. Finally let's say Node A received instead the partial update {{(b=4)}} in
the previous example.
* We issue a query at {{ALL/QUORUM}} for {{a=1 AND b=3}}.
* The repaired set index query against node B returns nothing, as we would
expect. (A would as well.)
* The un-repaired set query against node B returns a match against the special
postings list as it contains no value for {{b}}, and node A returns a match
against the special postings list as it contains no value for {{a}}. (A and B
would explicitly not match the query against the column values that DO exist in
the un-repaired set.)
* At the coordinator, we now have a complete row from a A {{(a=1, b=4)}} and B
{{(a=2, b=2)}}. We combine this QUORUM to get {{(a=2, b=4)}}, which correctly
fails post-filtering and returns no result to the client. Notably, no RFP is
required, since we get a QUORUM without it.
It may also be possible to avoid the special "nullary" postings list and rely
more heavily on RFP, but I think that will be a separate comment...
> Queries with multi-column replica-side filtering can miss rows
> --------------------------------------------------------------
>
> Key: CASSANDRA-19007
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19007
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination
> Reporter: Andres de la Peña
> Assignee: Caleb Rackliffe
> Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> {{SELECT}} queries with multi-column replica-side filtering can miss rows if
> the filtered columns are spread across out-of-sync replicas. This dtest
> reproduces the issue:
> {code:java}
> @Test
> public void testMultiColumnReplicaSideFiltering() throws IOException
> {
> try (Cluster cluster = init(Cluster.build().withNodes(2).start()))
> {
> cluster.schemaChange(withKeyspace("CREATE TABLE %s.t (k int PRIMARY
> KEY, a int, b int)"));
> // insert a split row
> cluster.get(1).executeInternal(withKeyspace("INSERT INTO %s.t(k, a)
> VALUES (0, 1)"));
> cluster.get(2).executeInternal(withKeyspace("INSERT INTO %s.t(k, b)
> VALUES (0, 2)"));
> String select = withKeyspace("SELECT * FROM %s.t WHERE a = 1 AND b =
> 2 ALLOW FILTERING");
> Object[][] initialRows = cluster.coordinator(1).execute(select, ALL);
> assertRows(initialRows, row(0, 1, 2)); // not found!!
> }
> }
> {code}
> This edge case affects queries using {{ALLOW FILTERING}} or any index
> implementation.
> It affects all branches since multi-column replica-side filtering queries
> were introduced, long before 3.0.
> The protection mechanism added by CASSANDRA-8272/8273 won't deal with this
> case, since it only solves single-column conflicts where stale rows could
> resurrect. This bug however doesn't resurrect data, it can only miss rows
> while the replicas are out-of-sync.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]