[jira] [Commented] (CASSANDRA-16856) Prevent broken concurrent schema pulls

Berenguer Blasi (Jira) Mon, 27 Sep 2021 22:30:09 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421164#comment-17421164
 ]


Berenguer Blasi commented on CASSANDRA-16856:
---------------------------------------------

[~maedhroz][~jmckenzie] the origin of this ticket is a multi-node fallout test 
that was failing to start some node bc of this reason.

[~jmckenzie] good point I can add the comment + ticket ref to all the places 
for clarity.

[~maedhroz] Brandon and I had a discussion on how to better test this. Besides 
the one you point out there was another dtest I played with that was also very 
re-usable. But in the end we had a worry: look at {{applyChanges()}} where the 
method itself is not synched but the paths that call it are. The worry being 
we'd be testing the call paths being synched but not the method itself being 
synched. Instantiating the class itself and mocking everything is 'not doable' 
within reason, the class is final and can't be extended, etc. So that junit is 
what I could come up with. And tbh I dislike both approaches: the one I did and 
going for a dtest. But I couldn't think of anything else. Maybe I am thinking 
too much about it.

> Prevent broken concurrent schema pulls
> --------------------------------------
>
>                 Key: CASSANDRA-16856
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16856
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Berenguer Blasi
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 4.1, 3.11.x, 4.0.x
>
>
> There's a race condition around pulling schema changes, that can occur in 
> case the schema changes push/propagation mechanism is not immediately 
> effective (e.g. because of network delay, or because of the pulling node 
> being down, etc.).
> If schema changes happen on node 1, these changes do not reach node 2 
> immediately through the SCHEMA.PUSH mechanism, and are first recognized 
> during gossiping, the corresponding SCHEMA.PULL request from node 2 can catch 
> the node 1 schema in the middle of it being modified by another schema change 
> request. This can easily lead to problems (e.g. if a new table is being 
> added, and the node 2 request reads the changes that need to be applied to  
> system_schema.tables, but not the ones that need to be applied to 
> system_schema.columns).
> This PR addresses that by synchronizing the SCHEMA.PULL "RPC call" executed 
> in node 1 by a request from node 2 with the method for applying schema 
> changes in node 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-16856) Prevent broken concurrent schema pulls

Reply via email to