[
https://issues.apache.org/jira/browse/CASSANDRA-19018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17786966#comment-17786966
]
Caleb Rackliffe commented on CASSANDRA-19018:
---------------------------------------------
[~adelapena] End-to-end there are three places where we'll have to think about
filtering. Working backwards: coordinator resoluiton, the local table-level
read/query, and the SAI SSTable index queries.
(Note that we should have sufficient information in the query context to
short-circuit this entire process if the query type makes it unnecessary.)
1.) SSTable-level
When we have an AND query across two columns, we produce a stream of primary
keys for each clause (i.e. one for each column). These streams are the union of
all the SSTable index matches for their respective columns, in order. (See
{{QueryController#getIndexQueryResults()}}) The point where things start to go
wrong in the current implementation is when we attempt to AND these two
unions/streams together. We assume a repaired view of the row at all times, and
so an intersection culls out partial matches that may or may not be needed to
resolve a final result at the coordinator. How to fix this?
The approach we've talked about here is, I think, splitting up the
{{QueryView}} into repaired and un-repaired SSTables (i.e. SSTable indexes).
Once this is done, we can query the repaired set and produce the intersection
exactly as before. For the un-repaired set, we need to produce primary keys for
the partial matches though. We can simply produce a union instead of an
intersection here, but we'd be heavily reliant on the un-repaired set being
small to keep the number of unnecessarily matched rows minimal. (Trying to
optimize this might not be straightforward. Having a special postings list for
missing column values might allow for an intersection instead of a union here,
but I haven't thought through all its implications for overall correctness.)
Indexes on clustering keys are a special case here. Since they must always be
present, we should never have to union them together with anything. Even with
more than one normal column in the query, their clauses can be OR'd then AND'd
together with the clause that touches the clustering key. There are cases where
this can whittle down the number of results that pass to the next stage.
Once we've queried both repaired and un-repaired sets, we can union the two
streams of primary keys together to move to the next step...
2.) Table-level
Once we've produced a set of primary keys that may (or may not) be matches, we
need to read and filter the rows. If we know we're potentially dealing w/
partial update reconciliation, we obviously can't just apply the filter as we
do now in {{ResultRetriever#applyIndexFilter()}}. Even preserving possible
matches from the SAI indexes themselves, we can now again prematurely cull out
a match by filtering it out when one of the columns in our AND query is simply
missing a value. So what do we need to keep here, as we're now filtering
matches from both the repaired and un-repaired sets?
(Here's where the ideas in your previous comment enter...)
The easiest thing to keep is a row that actually matches all clauses, {{a=x AND
b=y}}. The easiest thing to throw away, which we do now, is a row delete. After
that, all we need to do is make sure we keep partial matches when the
timestamps of the normal columns involved in the query aren't the same. (This
is similar to the {{(a=x AND b=y) OR (a=x AND TIMESTAMP(a) > TIMESTAMP(b)) OR
(b=y AND TIMESTAMP(b) > TIMESTAMP(a))}} suggestion I think.) We might also want
to break the matching logic in {{FilterTree}} down a bit around this, because
we only care about the timestamps of the non-primary key columns. Primary key
elements (i.e. for indexed clustering keys) in an AND query can be evaluated
first and fail matches before we even look at potentially problematic normal
columns.
3.) Coordinator-level
This should be the easiest part of this whole project, and looking more at
{{DataResolver}}, I think the final filtering we need before handing off to the
client is already in place in
{{DataResolver#resolveWithReplicaFilteringProtection()}}. This, like everything
else here, needs to be thoroughly tested.
WDYT?
Either way, I'm going to start working on a testing framework that tries to
cover the whole space around partial updates. (Different consistency levels,
interactions of existing data and partial updates, key/static/normal columns,
interactions w/ read-repair and normal repair, cases that currently hit RFP,
etc.)
> An SAI-specific mechanism to ensure consistency isn't violated for
> multi-column (i.e. AND) queries at CL > ONE
> --------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19018
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19018
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination, Feature/SAI
> Reporter: Caleb Rackliffe
> Assignee: Caleb Rackliffe
> Priority: Normal
> Fix For: 5.0.x, 5.x
>
>
> CASSANDRA-19007 is going to be where we add a guardrail around
> filtering/index queries that use intersection/AND over partially updated
> non-key columns. (ex. Restricting one clustering column and one normal column
> does not cause a consistency problem, as primary keys cannot be partially
> updated.) This issue exists to attempt to fix this specifically for SAI in
> 5.0.x, as Accord will (last I checked) not be available until the 5.1 release.
> The SAI-specific version of the originally reported issue is this:
> {noformat}
> try (Cluster cluster = init(Cluster.build(2).withConfig(config ->
> config.with(GOSSIP).with(NETWORK)).start()))
> {
> cluster.schemaChange(withKeyspace("CREATE TABLE %s.t (k int
> PRIMARY KEY, a int, b int)"));
> cluster.schemaChange(withKeyspace("CREATE INDEX ON %s.t(a) USING
> 'sai'"));
> cluster.schemaChange(withKeyspace("CREATE INDEX ON %s.t(b) USING
> 'sai'"));
> // insert a split row
> cluster.get(1).executeInternal(withKeyspace("INSERT INTO %s.t(k,
> a) VALUES (0, 1)"));
> cluster.get(2).executeInternal(withKeyspace("INSERT INTO %s.t(k,
> b) VALUES (0, 2)"));
> // Uncomment this line and test succeeds w/ partial writes
> completed...
> //cluster.get(1).nodetoolResult("repair",
> KEYSPACE).asserts().success();
> String select = withKeyspace("SELECT * FROM %s.t WHERE a = 1 AND
> b = 2");
> Object[][] initialRows = cluster.coordinator(1).execute(select,
> ConsistencyLevel.ALL);
> assertRows(initialRows, row(0, 1, 2)); // not found!!
> }
> {noformat}
> To make a long story short, the local SAI indexes are hiding local partial
> matches from the coordinator that would combine there to form full matches.
> Simple non-index filtering queries also suffer from this problem, but they
> hide the partial matches in a different way. I'll outline a possible solution
> for this in the comments that takes advantage of replica filtering protection
> and the repaired/unrepaired datasets...and attempts to minimize the amount of
> extra row data sent to the coordinator.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]