[ 
https://issues.apache.org/jira/browse/CASSANDRA-16856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420501#comment-17420501
 ] 

Berenguer Blasi edited comment on CASSANDRA-16856 at 9/27/21, 6:42 AM:
-----------------------------------------------------------------------

What you mention makes sense. The concurrent read/write path is not protected. 
I am going to open a ticket for these modification as this is not a part of the 
code I know inside out, where we'll be able to run tests. Let's cross fingers 
there are no exotic scenarios leading to some deadlock.

As per the junit this is the best we could came up with without going into a 
multinode/byteman craze that might end up not testing the actual path. It's 
better than nothing but suggestions welcomed.


was (Author: bereng):
What you mention makes sense. The concurrent read/write path is not protected. 
I am going to open a ticket for these modification as this is not a part of the 
code I know inside out, where we'll be able to run tests. Let's cross fingers 
there are no exotic scenarios leading to some deadlock.

> Prevent broken concurrent schema pulls
> --------------------------------------
>
>                 Key: CASSANDRA-16856
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16856
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Berenguer Blasi
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 4.1, 3.11.x, 4.0.x
>
>
> There's a race condition around pulling schema changes, that can occur in 
> case the schema changes push/propagation mechanism is not immediately 
> effective (e.g. because of network delay, or because of the pulling node 
> being down, etc.).
> If schema changes happen on node 1, these changes do not reach node 2 
> immediately through the SCHEMA.PUSH mechanism, and are first recognized 
> during gossiping, the corresponding SCHEMA.PULL request from node 2 can catch 
> the node 1 schema in the middle of it being modified by another schema change 
> request. This can easily lead to problems (e.g. if a new table is being 
> added, and the node 2 request reads the changes that need to be applied to  
> system_schema.tables, but not the ones that need to be applied to 
> system_schema.columns).
> This PR addresses that by synchronizing the SCHEMA.PULL "RPC call" executed 
> in node 1 by a request from node 2 with the method for applying schema 
> changes in node 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to