[jira] [Commented] (CASSANDRA-19018) An SAI-specific mechanism to ensure consistency isn't violated for multi-column (i.e. AND) queries at CL > ONE

Caleb Rackliffe (Jira) Thu, 16 Nov 2023 15:23:06 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786966#comment-17786966
 ]


Caleb Rackliffe commented on CASSANDRA-19018:
---------------------------------------------

[~adelapena] End-to-end there are three places where we'll have to think about 
filtering. Working backwards: coordinator resoluiton, the local table-level 
read/query, and the SAI SSTable index queries.

(Note that we should have sufficient information in the query context to 
short-circuit this entire process if the query type makes it unnecessary.)


1.) SSTable-level

When we have an AND query across two columns, we produce a stream of primary 
keys for each clause (i.e. one for each column). These streams are the union of 
all the SSTable index matches for their respective columns, in order. (See 
{{QueryController#getIndexQueryResults()}}) The point where things start to go 
wrong in the current implementation is when we attempt to AND these two 
unions/streams together. We assume a repaired view of the row at all times, and 
so an intersection culls out partial matches that may or may not be needed to 
resolve a final result at the coordinator. How to fix this?

The approach we've talked about here is, I think, splitting up the 
{{QueryView}} into repaired and un-repaired SSTables (i.e. SSTable indexes). 
Once this is done, we can query the repaired set and produce the intersection 
exactly as before. For the un-repaired set, we need to produce primary keys for 
the partial matches though. We can simply produce a union instead of an 
intersection here, but we'd be heavily reliant on the un-repaired set being 
small to keep the number of unnecessarily matched rows minimal. (Trying to 
optimize this might not be straightforward. Having a special postings list for 
missing column values might allow for an intersection instead of a union here, 
but I haven't thought through all its implications for overall correctness.)

Indexes on clustering keys are a special case here. Since they must always be 
present, we should never have to union them together with anything. Even with 
more than one normal column in the query, their clauses can be OR'd then AND'd 
together with the clause that touches the clustering key. There are cases where 
this can whittle down the number of results that pass to the next stage.

Once we've queried both repaired and un-repaired sets, we can union the two 
streams of primary keys together to move to the next step...


2.) Table-level

Once we've produced a set of primary keys that may (or may not) be matches, we 
need to read and filter the rows. If we know we're potentially dealing w/ 
partial update reconciliation, we obviously can't just apply the filter as we 
do now in {{ResultRetriever#applyIndexFilter()}}. Even preserving possible 
matches from the SAI indexes themselves, we can now again prematurely cull out 
a match by filtering it out when one of the columns in our AND query is simply 
missing a value. So what do we need to keep here, as we're now filtering 
matches from both the repaired and un-repaired sets?

(Here's where the ideas in your previous comment enter...)

The easiest thing to keep is a row that actually matches all clauses, {{a=x AND 
b=y}}. The easiest thing to throw away, which we do now, is a row delete. After 
that, all we need to do is make sure we keep partial matches when the 
timestamps of the normal columns involved in the query aren't the same. (This 
is similar to the {{(a=x AND b=y) OR (a=x AND TIMESTAMP(a) > TIMESTAMP(b)) OR 
(b=y AND TIMESTAMP(b) > TIMESTAMP(a))}} suggestion I think.) We might also want 
to break the matching logic in {{FilterTree}} down a bit around this, because 
we only care about the timestamps of the non-primary key columns. Primary key 
elements (i.e. for indexed clustering keys) in an AND query can be evaluated 
first and fail matches before we even look at potentially problematic normal 
columns.


3.) Coordinator-level

This should be the easiest part of this whole project, and looking more at 
{{DataResolver}}, I think the final filtering we need before handing off to the 
client is already in place in 
{{DataResolver#resolveWithReplicaFilteringProtection()}}. This, like everything 
else here, needs to be thoroughly tested.

WDYT?

Either way, I'm going to start working on a testing framework that tries to 
cover the whole space around partial updates. (Different consistency levels, 
interactions of existing data and partial updates, key/static/normal columns, 
interactions w/ read-repair and normal repair, cases that currently hit RFP, 
etc.)

> An SAI-specific mechanism to ensure consistency isn't violated for 
> multi-column (i.e. AND) queries at CL > ONE
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19018
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19018
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination, Feature/SAI
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 5.0.x, 5.x
>
>
> CASSANDRA-19007 is going to be where we add a guardrail around 
> filtering/index queries that use intersection/AND over partially updated 
> non-key columns. (ex. Restricting one clustering column and one normal column 
> does not cause a consistency problem, as primary keys cannot be partially 
> updated.) This issue exists to attempt to fix this specifically for SAI in 
> 5.0.x, as Accord will (last I checked) not be available until the 5.1 release.
> The SAI-specific version of the originally reported issue is this:
> {noformat}
> try (Cluster cluster = init(Cluster.build(2).withConfig(config -> 
> config.with(GOSSIP).with(NETWORK)).start()))
>         {
>             cluster.schemaChange(withKeyspace("CREATE TABLE %s.t (k int 
> PRIMARY KEY, a int, b int)"));
>             cluster.schemaChange(withKeyspace("CREATE INDEX ON %s.t(a) USING 
> 'sai'"));
>             cluster.schemaChange(withKeyspace("CREATE INDEX ON %s.t(b) USING 
> 'sai'"));
>             // insert a split row
>             cluster.get(1).executeInternal(withKeyspace("INSERT INTO %s.t(k, 
> a) VALUES (0, 1)"));
>             cluster.get(2).executeInternal(withKeyspace("INSERT INTO %s.t(k, 
> b) VALUES (0, 2)"));
>         // Uncomment this line and test succeeds w/ partial writes 
> completed...
>         //cluster.get(1).nodetoolResult("repair", 
> KEYSPACE).asserts().success();
>             String select = withKeyspace("SELECT * FROM %s.t WHERE a = 1 AND 
> b = 2");
>             Object[][] initialRows = cluster.coordinator(1).execute(select, 
> ConsistencyLevel.ALL);
>             assertRows(initialRows, row(0, 1, 2)); // not found!!
>         }
> {noformat}
> To make a long story short, the local SAI indexes are hiding local partial 
> matches from the coordinator that would combine there to form full matches. 
> Simple non-index filtering queries also suffer from this problem, but they 
> hide the partial matches in a different way. I'll outline a possible solution 
> for this in the comments that takes advantage of replica filtering protection 
> and the repaired/unrepaired datasets...and attempts to minimize the amount of 
> extra row data sent to the coordinator.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-19018) An SAI-specific mechanism to ensure consistency isn't violated for multi-column (i.e. AND) queries at CL > ONE

Reply via email to