[jira] [Commented] (CASSANDRA-19007) Queries with multi-column replica-side filtering can miss rows

Caleb Rackliffe (Jira) Wed, 20 Mar 2024 08:26:13 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17829211#comment-17829211
 ]


Caleb Rackliffe commented on CASSANDRA-19007:
---------------------------------------------

[~Bereng] I wouldn't say that an actual solution (which I had [started to 
explore|https://github.com/apache/cassandra/pull/3155] along w/ 
CASSANDRA-19018) should block 5.0, as this has been broken since the beginning 
of time for normal filtering queries. What I think we should do at this point 
is just spin off a separate Jira to emit a client warning/put a guardrail in 
place if a user attempts to do a read that involves filtering (without SAI, 
which is fixed, or at least without an index) on multiple mutable/regular 
columns at a consistency level that requires coordinator resolution.

We can keep this Jira to track an actual fix, which honestly might not be that 
bad, given how much of the dirty work CASSANDRA-19018 has already done, in 
terms of testing infrastructure and all the fixes to RFP.

If that all makes sense, feel free to throw up that Jira, or let me know if 
you'd like me to, and I can.

> Queries with multi-column replica-side filtering can miss rows
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-19007
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19007
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Andres de la Peña
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.0.x, 5.x
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{SELECT}} queries with multi-column replica-side filtering can miss rows if 
> the filtered columns are spread across out-of-sync replicas. This dtest 
> reproduces the issue:
> {code:java}
> @Test
> public void testMultiColumnReplicaSideFiltering() throws IOException
> {
>     try (Cluster cluster = init(Cluster.build().withNodes(2).start()))
>     {
>         cluster.schemaChange(withKeyspace("CREATE TABLE %s.t (k int PRIMARY 
> KEY, a int, b int)"));
>         // insert a split row
>         cluster.get(1).executeInternal(withKeyspace("INSERT INTO %s.t(k, a) 
> VALUES (0, 1)"));
>         cluster.get(2).executeInternal(withKeyspace("INSERT INTO %s.t(k, b) 
> VALUES (0, 2)"));
>         String select = withKeyspace("SELECT * FROM %s.t WHERE a = 1 AND b = 
> 2 ALLOW FILTERING");
>         Object[][] initialRows = cluster.coordinator(1).execute(select, ALL);
>         assertRows(initialRows, row(0, 1, 2)); // not found!!
>     }
> }
> {code}
> This edge case affects queries using {{ALLOW FILTERING}} or any index 
> implementation.
> It affects all branches since multi-column replica-side filtering queries 
> were introduced, long before 3.0.
> The protection mechanism added by CASSANDRA-8272/8273 won't deal with this 
> case, since it only solves single-column conflicts where stale rows could 
> resurrect. This bug however doesn't resurrect data, it can only miss rows 
> while the replicas are out-of-sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-19007) Queries with multi-column replica-side filtering can miss rows

Reply via email to