[jira] [Commented] (CASSANDRA-19007) Queries with multi-column replica-side filtering can miss rows

Caleb Rackliffe (Jira) Wed, 08 Nov 2023 21:02:06 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784279#comment-17784279
 ]


Caleb Rackliffe commented on CASSANDRA-19007:
---------------------------------------------

If we avoid writing the "nullary" postings lists, the examples above might 
looks like this...

ex. We have a two node cluster, and the row {{(a=1, b=2)}} exists in the 
repaired set for both.
 - Node A receives the partial update {{(b=3)}} but node B does not.
 - We issue a query at ALL/QUORUM for {{{}a=1 AND b=3{}}}.
 - The repaired set index query against node B returns nothing, as we would 
expect.
 - The un-repaired set query against node B returns nothing, as no un-repaired 
data exists, and node A returns a match, since the single clause b=3 hits the 
partial update A received. (A returns a full row that includes data from the 
repaired set as well.)
 - At the coordinator, we now have a complete row from a A {{{}(a=1, b=3){}}}, 
but we still don't have a QUORUM. Happily, this is exactly the case 
replica-filtering protection protects us from, and it could be used to complete 
the QUORUM.
 - RFP would return a full row from node B {{{}(a=1, b=2){}}}, which would 
merge w/ node A to remain {{(a=1, b=3)}} and pass post-filtering.

ex. Let's assume Node B had also received a partial update {{(a=2)}} in the 
example above.
 - We issue a query at ALL/QUORUM for {{{}a=1 AND b=3{}}}.
 - The repaired set index query against node B returns nothing, as we would 
expect.
 - The un-repaired set query against node B returns nothing, as {{a=2}} there 
and there is no nullary postings list, but node A returns a match, since the 
single clause b=3 hits the partial update A received.
 - At the coordinator, we now have a complete row from a A {{(a=1, b=3)}} but 
nothing else, as B returned nothing from either of the repaired/un-repaired 
queries.
 - RFP must then fetch the complete row from B {{{}(a=2, b=2){}}}, which merges 
w/ A to produce {{(a=2, b=3)}} and correctly fails post-filtering.

ex. Finally let's say Node A received instead the partial update {{(b=4)}} in 
the previous example.
 - We issue a query at ALL/QUORUM for {{{}a=1 AND b=3{}}}.
 - The repaired set index query against node B returns nothing, as we would 
expect.
 - The un-repaired set query against node B returns nothing, as {{a=2}} there 
and there is no nullary postings list, and node A also returns nothing, because 
there is no partial match on {{b}} or nullary postings list to match on 
{{{}a{}}}.
 - At the coordinator, we have no matches from repaired data, and there isn't 
even a single clause match elsewhere on un-repaired data, so there's no way we 
can produce a match, and we return an empty result to the client.

(Note that, in any of these cases, returning an actual row match from the 
repaired set query, combined w/ no results from the un-repaired set, would 
require RFP to ensure the result from the repaired set isn't stale...I think.)

 

I'll try to actually codify these cases and maybe a few more as tests tomorrow, 
just for my sanity and to prove they all fail right now w/ SAI or plain 
filtering....

> Queries with multi-column replica-side filtering can miss rows
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-19007
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19007
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Andres de la Peña
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.0.x, 5.x
>
>
> {{SELECT}} queries with multi-column replica-side filtering can miss rows if 
> the filtered columns are spread across out-of-sync replicas. This dtest 
> reproduces the issue:
> {code:java}
> @Test
> public void testMultiColumnReplicaSideFiltering() throws IOException
> {
>     try (Cluster cluster = init(Cluster.build().withNodes(2).start()))
>     {
>         cluster.schemaChange(withKeyspace("CREATE TABLE %s.t (k int PRIMARY 
> KEY, a int, b int)"));
>         // insert a split row
>         cluster.get(1).executeInternal(withKeyspace("INSERT INTO %s.t(k, a) 
> VALUES (0, 1)"));
>         cluster.get(2).executeInternal(withKeyspace("INSERT INTO %s.t(k, b) 
> VALUES (0, 2)"));
>         String select = withKeyspace("SELECT * FROM %s.t WHERE a = 1 AND b = 
> 2 ALLOW FILTERING");
>         Object[][] initialRows = cluster.coordinator(1).execute(select, ALL);
>         assertRows(initialRows, row(0, 1, 2)); // not found!!
>     }
> }
> {code}
> This edge case affects queries using {{ALLOW FILTERING}} or any index 
> implementation.
> It affects all branches since multi-column replica-side filtering queries 
> were introduced, long before 3.0.
> The protection mechanism added by CASSANDRA-8272/8273 won't deal with this 
> case, since it only solves single-column conflicts where stale rows could 
> resurrect. This bug however doesn't resurrect data, it can only miss rows 
> while the replicas are out-of-sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-19007) Queries with multi-column replica-side filtering can miss rows

Reply via email to