[
https://issues.apache.org/jira/browse/CASSANDRA-16856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17419581#comment-17419581
]
Caleb Rackliffe edited comment on CASSANDRA-16856 at 9/24/21, 5:54 AM:
-----------------------------------------------------------------------
I've been looking at the 4.0 & trunk versions of this patch, and I'm having a
hard time putting things together in my head. Reading the description above, it
seems like the approach was going to be a.) synchronize
{{SchemaKeyspace.convertSchemaToMutations()}}, effectively serializing requests
handled by {{SchemaPullVerbHandler}} and b.) synchronize
{{SchemaKeyspace.applyChanges()}} (I'm guessing?), which is where mutations to
the schema keyspace are actually applied. In other words, the idea was to not
allow concurrent reads and writes on the state protected by {{SchemaKeyspace}}.
(Would we also need to synchronize {{truncate()}} and
{{saveSystemKeyspacesSchema()}}?)
It seems like only "a" was done here and not "b", and the attached test is sort
of just a trip-wire for if anyone ever tries to remove the monitor lock.
CC [~bereng] [~brandon.williams]
was (Author: maedhroz):
I've been looking at the 4.0 & trunk versions of this patch, and I'm having a
hard time putting things together in my head. Reading the description above, it
seems like the approach was going to be a.) synchronize
{{SchemaKeyspace.convertSchemaToMutations()}}, effectively serializing requests
handled by {{SchemaPullVerbHandler}} and b.) synchronize
{{SchemaKeyspace.applyChanges()}} (I'm guessing?), which is where mutations to
the schema keyspace are actually applied. In other words, the idea was to not
allow concurrent reads and writes on the state protected by {{SchemaKeyspace}}.
(Would we also need to synchronize {{truncate()}}?)
It seems like only "a" was done here, and not "b".
CC [~bereng] [~brandon.williams]
> Prevent broken concurrent schema pulls
> --------------------------------------
>
> Key: CASSANDRA-16856
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16856
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: Berenguer Blasi
> Assignee: Berenguer Blasi
> Priority: Normal
> Fix For: 4.1, 3.11.x, 4.0.x
>
>
> There's a race condition around pulling schema changes, that can occur in
> case the schema changes push/propagation mechanism is not immediately
> effective (e.g. because of network delay, or because of the pulling node
> being down, etc.).
> If schema changes happen on node 1, these changes do not reach node 2
> immediately through the SCHEMA.PUSH mechanism, and are first recognized
> during gossiping, the corresponding SCHEMA.PULL request from node 2 can catch
> the node 1 schema in the middle of it being modified by another schema change
> request. This can easily lead to problems (e.g. if a new table is being
> added, and the node 2 request reads the changes that need to be applied to
> system_schema.tables, but not the ones that need to be applied to
> system_schema.columns).
> This PR addresses that by synchronizing the SCHEMA.PULL "RPC call" executed
> in node 1 by a request from node 2 with the method for applying schema
> changes in node 1.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]