[
https://issues.apache.org/jira/browse/CASSANDRA-16878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17484816#comment-17484816
]
Andres de la Peña commented on CASSANDRA-16878:
-----------------------------------------------
I also think that the mutations are written in order.
[~yifanc] your test is quite useful for this, is it ok if I add it to the PR
setting you as coauthor? It seems that the test is flaky because the last
assert doesn't wait for the asynchronous processing of the replayed mutations.
This can be seen in
[this|https://app.circleci.com/pipelines/github/adelapena/cassandra/1263/workflows/30d3db98-77d3-458b-b800-c22777c94305/jobs/11946]
1000-iteration run in the multiplexer, which hits 123 failures. This can be
easily fixed by just using
[{{spinAssertEquals}}|https://github.com/adelapena/cassandra/commit/9845b39ede623d840ef8ed6233ac6f7cd8e57047]
to wait until the 99 mutations are applied. This makes the 1000 iterations
[pass|https://app.circleci.com/pipelines/github/adelapena/cassandra/1265/workflows/6910b8f2-1fd0-42bf-b67a-6e6c1d83ec77/jobs/11951].
Alternatively, we can call
[{{CommitLogReplayer#blockForWrites}}|https://github.com/adelapena/cassandra/commit/c3c156f021085c046d6631f042e9b54534ecfc2f]
to wait for all the mutations, which also makes the repeated run
[pass|https://app.circleci.com/pipelines/github/adelapena/cassandra/1266/workflows/f950bf46-af27-4f38-81ce-bc35186e6309/jobs/11955].
> Race in commit log replay can cause rejected mutations
> ------------------------------------------------------
>
> Key: CASSANDRA-16878
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16878
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Local Write-Read Paths
> Reporter: Andres de la Peña
> Assignee: Andres de la Peña
> Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
> Time Spent: 2.5h
> Remaining Estimate: 0h
>
> We don't force order in the execution of replayed mutations and hence a
> mutation can move ahead of or behind a schema change it relies on (e.g.
> added/removed column), which can then cause it to be rejected because of a
> schema mismatch.
> To fix this, we need to identify schema mutations and make sure the log
> enforces their execution after all previous mutations have completed and
> before anything following is started.
> Schema mutations are
> [flushed|https://github.com/apache/cassandra/blob/cassandra-4.0.0/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L1266-L1271]
> after being applied, so this only would be a problem if the node abruptly
> stops before flushing the schema mutation.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]