[ 
https://issues.apache.org/jira/browse/CASSANDRA-16856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419581#comment-17419581
 ] 

Caleb Rackliffe edited comment on CASSANDRA-16856 at 9/24/21, 5:54 AM:
-----------------------------------------------------------------------

I've been looking at the 4.0 & trunk versions of this patch, and I'm having a 
hard time putting things together in my head. Reading the description above, it 
seems like the approach was going to be a.) synchronize 
{{SchemaKeyspace.convertSchemaToMutations()}}, effectively serializing requests 
handled by {{SchemaPullVerbHandler}} and b.) synchronize 
{{SchemaKeyspace.applyChanges()}} (I'm guessing?), which is where mutations to 
the schema keyspace are actually applied. In other words, the idea was to not 
allow concurrent reads and writes on the state protected by {{SchemaKeyspace}}. 
(Would we also need to synchronize {{truncate()}} and 
{{saveSystemKeyspacesSchema()}}?)

It seems like only "a" was done here and not "b", and the attached test is sort 
of just a trip-wire for if anyone ever tries to remove the monitor lock.

CC [~bereng] [~brandon.williams]


was (Author: maedhroz):
I've been looking at the 4.0 & trunk versions of this patch, and I'm having a 
hard time putting things together in my head. Reading the description above, it 
seems like the approach was going to be a.) synchronize 
{{SchemaKeyspace.convertSchemaToMutations()}}, effectively serializing requests 
handled by {{SchemaPullVerbHandler}} and b.) synchronize 
{{SchemaKeyspace.applyChanges()}} (I'm guessing?), which is where mutations to 
the schema keyspace are actually applied. In other words, the idea was to not 
allow concurrent reads and writes on the state protected by {{SchemaKeyspace}}. 
(Would we also need to synchronize {{truncate()}}?)

It seems like only "a" was done here, and not "b".

CC [~bereng] [~brandon.williams]

> Prevent broken concurrent schema pulls
> --------------------------------------
>
>                 Key: CASSANDRA-16856
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16856
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Berenguer Blasi
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 4.1, 3.11.x, 4.0.x
>
>
> There's a race condition around pulling schema changes, that can occur in 
> case the schema changes push/propagation mechanism is not immediately 
> effective (e.g. because of network delay, or because of the pulling node 
> being down, etc.).
> If schema changes happen on node 1, these changes do not reach node 2 
> immediately through the SCHEMA.PUSH mechanism, and are first recognized 
> during gossiping, the corresponding SCHEMA.PULL request from node 2 can catch 
> the node 1 schema in the middle of it being modified by another schema change 
> request. This can easily lead to problems (e.g. if a new table is being 
> added, and the node 2 request reads the changes that need to be applied to  
> system_schema.tables, but not the ones that need to be applied to 
> system_schema.columns).
> This PR addresses that by synchronizing the SCHEMA.PULL "RPC call" executed 
> in node 1 by a request from node 2 with the method for applying schema 
> changes in node 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to