[ 
https://issues.apache.org/jira/browse/CASSANDRA-16878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17494810#comment-17494810
 ] 

Yifan Cai commented on CASSANDRA-16878:
---------------------------------------

Again, you are right :D

The related mutations and the schema change (i.e. persist and update in-memory) 
needs to be linearized. The related mutations are the ones contain updates to 
the changed schema. Before applying the schema change, it creates a barrier and 
waits for all related pending mutations to complete. No related mutations 
arriving after the barrier can be created until the schema change completes. 
Maybe use a readwrite lock to implement. It could result into write timeouts 
during schema change.

Alternatively, maybe we can reorder the mutations when replaying. But it would 
require to serialize the schema version, and change the commit log format. 
Group the mutations belongs to the same schema version and replay each group.

> Race in commit log replay can cause rejected mutations
> ------------------------------------------------------
>
>                 Key: CASSANDRA-16878
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16878
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Local Write-Read Paths
>            Reporter: Andres de la Peña
>            Assignee: Andres de la Peña
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> We don't force order in the execution of replayed mutations and hence a 
> mutation can move ahead of or behind a schema change it relies on (e.g. 
> added/removed column), which can then cause it to be rejected because of a 
> schema mismatch.
> To fix this, we need to identify schema mutations and make sure the log 
> enforces their execution after all previous mutations have completed and 
> before anything following is started.
> Schema mutations are 
> [flushed|https://github.com/apache/cassandra/blob/cassandra-4.0.0/src/java/org/apache/cassandra/schema/SchemaKeyspace.java#L1266-L1271]
>  after being applied, so this only would be a problem if the node abruptly 
> stops before flushing the schema mutation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to