[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231782#comment-17231782
 ] 

David Capwell edited comment on CASSANDRA-15158 at 11/13/20, 8:38 PM:
----------------------------------------------------------------------

Starting commit

CI Results: Yellow.  3.1 org.apache.cassandra.service.MigrationCoordinatorTest 
but passes locally, -trunk 
org.apache.cassandra.distributed.test.ring.BootstrapTest fails frequently due 
to schemas not present added commit which increases timeout from 30s to 90s-, 
and other expected issues.
||Branch||Source||Circle CI||Jenkins||
|cassandra-3.0|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/200/]|
|cassandra-3.11|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/201/]|
|trunk|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/202/]|



was (Author: dcapwell):
Starting commit

CI Results: Yellow.  3.1 org.apache.cassandra.service.MigrationCoordinatorTest 
but passes locally, trunk 
org.apache.cassandra.distributed.test.ring.BootstrapTest fails frequently due 
to schemas not present added commit which increases timeout from 30s to 90s, 
and other expected issues.
||Branch||Source||Circle CI||Jenkins||
|cassandra-3.0|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.0-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/200/]|
|cassandra-3.11|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-cassandra-3.11-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/201/]|
|trunk|[branch|https://github.com/dcapwell/cassandra/tree/commit_remote_branch/CASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=commit_remote_branch%2FCASSANDRA-15158-trunk-7E401495-E38F-4857-80C1-2C27028F572E]|[build|https://ci-cassandra.apache.org/job/Cassandra-devbranch/202/]|


> Wait for schema agreement rather than in flight schema requests when 
> bootstrapping
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip, Cluster/Schema
>            Reporter: Vincent White
>            Assignee: Blake Eggleston
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to