[jira] [Comment Edited] (CASSANDRA-16856) Prevent broken concurrent schema pulls

Caleb Rackliffe (Jira) Mon, 27 Sep 2021 11:27:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17420953#comment-17420953
 ]


Caleb Rackliffe edited comment on CASSANDRA-16856 at 9/27/21, 6:26 PM:
-----------------------------------------------------------------------

Assuming the scenario in the description is something we arrived at simply via 
inspection...

{quote}
If schema changes happen on node 1, these changes do not reach node 2 
immediately through the SCHEMA.PUSH mechanism, and are first recognized during 
gossiping, the corresponding SCHEMA.PULL request from node 2 can catch the node 
1 schema in the middle of it being modified by another schema change request.
{quote}

...the goal of making access to schema via {{SchemaKeyspace}} atomic (in the 
sense that we won't expose partial schema changes) is reasonable. In terms of 
testing, I'd probably just start w/ a time-boxed fuzz test of 
{{SchemaKeyspace}} itself to reproduce atomicity violations, using that to 
verify the fix. (ex. 
{{DigestResolverTest#multiThreadedNoRepairNeededReadCallback()}})

(Even if there wasn't a catastrophic consequence to not changing this, having 
{{SchemaKeyspace}} expose some serial ordering of complete schema changes 
certainly reduces the possible states of the system to reason about other 
problems. Deadlock due to something like lock acquisition order doesn't seem 
like a worry here...)


was (Author: maedhroz):
Assuming the scenario in the description is something we arrived at simply via 
inspection...

{noformat}
If schema changes happen on node 1, these changes do not reach node 2 
immediately through the SCHEMA.PUSH mechanism, and are first recognized during 
gossiping, the corresponding SCHEMA.PULL request from node 2 can catch the node 
1 schema in the middle of it being modified by another schema change request.
{noformat}

...the goal of making access to schema via {{SchemaKeyspace}} atomic (in the 
sense that we won't expose partial schema changes) is reasonable. In terms of 
testing, I'd probably just start w/ a time-boxed fuzz test of 
{{SchemaKeyspace}} itself to reproduce atomicity violations, using that to 
verify the fix. (ex. 
{{DigestResolverTest#multiThreadedNoRepairNeededReadCallback()}})

(Even if there wasn't a catastrophic consequence to not changing this, having 
{{SchemaKeyspace}} expose some serial ordering of complete schema changes 
certainly reduces the possible states of the system to reason about other 
problems. Deadlock due to something like lock acquisition order doesn't seem 
like a worry here...)

> Prevent broken concurrent schema pulls
> --------------------------------------
>
>                 Key: CASSANDRA-16856
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16856
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Berenguer Blasi
>            Assignee: Berenguer Blasi
>            Priority: Normal
>             Fix For: 4.1, 3.11.x, 4.0.x
>
>
> There's a race condition around pulling schema changes, that can occur in 
> case the schema changes push/propagation mechanism is not immediately 
> effective (e.g. because of network delay, or because of the pulling node 
> being down, etc.).
> If schema changes happen on node 1, these changes do not reach node 2 
> immediately through the SCHEMA.PUSH mechanism, and are first recognized 
> during gossiping, the corresponding SCHEMA.PULL request from node 2 can catch 
> the node 1 schema in the middle of it being modified by another schema change 
> request. This can easily lead to problems (e.g. if a new table is being 
> added, and the node 2 request reads the changes that need to be applied to  
> system_schema.tables, but not the ones that need to be applied to 
> system_schema.columns).
> This PR addresses that by synchronizing the SCHEMA.PULL "RPC call" executed 
> in node 1 by a request from node 2 with the method for applying schema 
> changes in node 1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-16856) Prevent broken concurrent schema pulls

Reply via email to