J Hickey created CASSANDRA-16408:
------------------------------------
Summary: Unable to bootstrap/join new nodes to existing 4.0 cluster
Key: CASSANDRA-16408
URL: https://issues.apache.org/jira/browse/CASSANDRA-16408
Project: Cassandra
Issue Type: Bug
Components: Cluster/Membership, Consistency/Bootstrap and Decommission
Reporter: J Hickey
Trying to add a new node to an existing 4.0 cluster gets stuck in
bootstrap/joining permanently with no clear error.
Version: 4.0-beta4 (issue also seen in 4.0-beta3, and NOT seen in 3.11.x) and
Java 8 (Open JDK 1.8.0_275)
Topology: 3 rack single DC using EC2Snitch, 1 seed node per rack
Relevant cassandra.yaml settings: auto_bootstrap: true (implicit), seeds
contains the same 3 nodes on all nodes, num_tokens: 16,
allocate_tokens_for_local_replication_factor: 3,
server_encryption_options.internode_encryption: all,
server_encryption_options.enabled: true, server_encryption_options.optional:
false, server_encryption_options.require_client_auth: true,
client_encryption_options.enabled: true, client_encryption_options.optional:
false, client_encryption_options.require_client_auth: true
Scenario: Bring up the 3 seed nodes to create a new cluster. Add a user
keyspace: create keyspace test with replication = \{ 'class':
'NetworkTopologyStrategy', 'us-east-1-dc': 3 }; and insert some test data. Wait
at least 10 minutes after the initial 3 seed nodes come up (nodes will join if
they are brought up at the same time as the seeds, but not if they are brought
up later). Start cassandra on a fourth node. Cassandra begins to bootstrap but
does not ever finish (I have left this running overnight) and does not exit nor
log any errors. Nodetool status from any node shows new node as UJ. Nodetool
netstats from new node shows receiving file from test keyspace at 100%
received. Logs show bootstrap starting and streaming starting, but then
nothing/no errors.
Worth noting here that I have also tried this with
allocate_tokens_for_local_replication_factor disabled and still have this
issue. I have also tried this without any user keyspace/data, just completely
empty cluster and still have this issue. The only way I seem to be able to
bring up a decently sized cluster on 4.0 is to disable
allocate_tokens_for_local_replication_factor (to avoid collisions as mentioned
in other issues) and bring up all nodes at about the same time, or use
auto_bootstrap: false. I have no issue adding a new node in a similar fashion
to a 3.11.x cluster.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]