[jira] [Commented] (CASSANDRA-19007) Queries with multi-column replica-side filtering can miss rows

Caleb Rackliffe (Jira) Wed, 08 Nov 2023 11:28:07 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784183#comment-17784183
 ]


Caleb Rackliffe commented on CASSANDRA-19007:
---------------------------------------------

{quote}The only approach that comes to mind is applying replica filtering only 
on one of the columns and doing the rest of the filtering on the coordinator, 
after reconciliation. But that would be very expensive.
{quote}
What it would amount to is evaluating all queries as OR queries at the replica, 
then filtering at the coordinator and relying on short-read protection to get 
enough results. Yeah, pretty disastrous for performance.

For SAI, the hard thing about the local index query is that the SSTable/column 
indexes only store matches. We have no idea whether an index doesn't produce a 
posting because there wasn't a value (like in a partial update case) or the 
value just doesn't match our expression. We could potentially change that (for 
SAI specifically) by actually storing postings for "missing" column data in 
column indexes, but that's obviously not free either.

 

CC [~mikea] 

> Queries with multi-column replica-side filtering can miss rows
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-19007
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19007
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Andres de la Peña
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>
> {{SELECT}} queries with multi-column replica-side filtering can miss rows if 
> the filtered columns are spread across out-of-sync replicas. This dtest 
> reproduces the issue:
> {code:java}
> @Test
> public void testMultiColumnReplicaSideFiltering() throws IOException
> {
>     try (Cluster cluster = init(Cluster.build().withNodes(2).start()))
>     {
>         cluster.schemaChange(withKeyspace("CREATE TABLE %s.t (k int PRIMARY 
> KEY, a int, b int)"));
>         // insert a split row
>         cluster.get(1).executeInternal(withKeyspace("INSERT INTO %s.t(k, a) 
> VALUES (0, 1)"));
>         cluster.get(2).executeInternal(withKeyspace("INSERT INTO %s.t(k, b) 
> VALUES (0, 2)"));
>         String select = withKeyspace("SELECT * FROM %s.t WHERE a = 1 AND b = 
> 2 ALLOW FILTERING");
>         Object[][] initialRows = cluster.coordinator(1).execute(select, ALL);
>         assertRows(initialRows, row(0, 1, 2)); // not found!!
>     }
> }
> {code}
> This edge case affects queries using {{ALLOW FILTERING}} or any index 
> implementation.
> It affects all branches since multi-column replica-side filtering queries 
> were introduced, long before 3.0.
> The protection mechanism added by CASSANDRA-8272/8273 won't deal with this 
> case, since it only solves single-column conflicts where stale rows could 
> resurrect. This bug however doesn't resurrect data, it can only miss rows 
> while the replicas are out-of-sync.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-19007) Queries with multi-column replica-side filtering can miss rows

Reply via email to