J Hickey created CASSANDRA-16408:
------------------------------------

             Summary: Unable to bootstrap/join new nodes to existing 4.0 cluster
                 Key: CASSANDRA-16408
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16408
             Project: Cassandra
          Issue Type: Bug
          Components: Cluster/Membership, Consistency/Bootstrap and Decommission
            Reporter: J Hickey


Trying to add a new node to an existing 4.0 cluster gets stuck in 
bootstrap/joining permanently with no clear error.

Version: 4.0-beta4 (issue also seen in 4.0-beta3, and NOT seen in 3.11.x) and 
Java 8 (Open JDK 1.8.0_275)
Topology: 3 rack single DC using EC2Snitch, 1 seed node per rack
Relevant cassandra.yaml settings: auto_bootstrap: true (implicit), seeds 
contains the same 3 nodes on all nodes, num_tokens: 16, 
allocate_tokens_for_local_replication_factor: 3, 
server_encryption_options.internode_encryption: all, 
server_encryption_options.enabled: true, server_encryption_options.optional: 
false, server_encryption_options.require_client_auth: true, 
client_encryption_options.enabled: true, client_encryption_options.optional: 
false, client_encryption_options.require_client_auth: true


Scenario: Bring up the 3 seed nodes to create a new cluster. Add a user 
keyspace: create keyspace test with replication = \{ 'class': 
'NetworkTopologyStrategy', 'us-east-1-dc': 3 }; and insert some test data. Wait 
at least 10 minutes after the initial 3 seed nodes come up (nodes will join if 
they are brought up at the same time as the seeds, but not if they are brought 
up later). Start cassandra on a fourth node. Cassandra begins to bootstrap but 
does not ever finish (I have left this running overnight) and does not exit nor 
log any errors. Nodetool status from any node shows new node as UJ. Nodetool 
netstats from new node shows receiving file from test keyspace at 100% 
received. Logs show bootstrap starting and streaming starting, but then 
nothing/no errors.

Worth noting here that I have also tried this with 
allocate_tokens_for_local_replication_factor disabled and still have this 
issue. I have also tried this without any user keyspace/data, just completely 
empty cluster and still have this issue. The only way I seem to be able to 
bring up a decently sized cluster on 4.0 is to disable 
allocate_tokens_for_local_replication_factor (to avoid collisions as mentioned 
in other issues) and bring up all nodes at about the same time, or use 
auto_bootstrap: false. I have no issue adding a new node in a similar fashion 
to a 3.11.x cluster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to