Vincent White created CASSANDRA-15158:
-----------------------------------------
Summary: Wait for schema agreement rather then in flight schema
requests when bootstrapping
Key: CASSANDRA-15158
URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
Project: Cassandra
Issue Type: Bug
Components: Cluster/Gossip
Reporter: Vincent White
Currently when a node is bootstrapping we use a set of latches
(org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of
in-flight schema pull requests, and we don't proceed with bootstrapping/stream
until all the latches are released (or we timeout waiting for each one). One
issue with this is that if we have a large schema, or the retrieval of the
schema from the other nodes was unexpectedly slow then we have no explicit
check in place to ensure we have actually received a schema before we proceed.
While it's possible to increase "migration_task_wait_in_seconds" to force the
node to wait on each latches longer, there are cases where this doesn't help
because the callbacks for the schema pull requests have expired off the
messaging service's callback map
(org.apache.cassandra.net.MessagingService#callbacks) after getMinRpcTimeout()
(2 seconds by default) before the other nodes were able to respond to the new
node.
This patch checks for schema agreement between the bootstrapping node and the
rest of the live nodes before proceeding with bootstrapping. It also adds a
check to prevent the new node from flooding existing nodes with simultaneous
schema pull requests as can happen in large clusters.
|||3.11||
|[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]