[
https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401149#comment-17401149
]
Roman Chyla commented on CASSANDRA-16364:
-----------------------------------------
Could this be playing any role?
[https://github.com/apache/cassandra/blob/8acbbe042b236c6948845ecd7af093c6f0fa3e4b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java#L2440]
I have double checked, nuked the cluster; removed `auto_bootstrap:true` from
cassandra.yaml (so that a default is used); redeployed - and I'm seeing the
same issue; i.e. I can reproduce it every time. I have a vanilla Docker
Cassandra 4.0.0 – they make almost no changes to cassandra dist; I grabbed
[https://github.com/apache/cassandra/blob/cassandra-4.0/conf/cassandra.yaml]
the only changes from stock config I have is:
num_tokens = 8
allocate_tokens_for_local_replication_factor: 2
#auto_bootstrap: true
# {{delete cassandra cluster (and removed all data from all nodes)}}
{{[rcluster] rchyla@tp1:/dvt/workspace/projects/it$ k delete sts cassandra}}
{{statefulset.apps "cassandra" deleted}}
# {{modify config (verify pods are gone)}}
{{[rcluster] rchyla@tp1:/dvt/workspace/projects/it$ k apply -f
deployments/cassandra/configmap.yaml }}
{{configmap/cassandra-config configured}}
{{[rcluster] rchyla@tp1:/dvt/workspace/projects/it$ k get po}}
{{NAME READY STATUS RESTARTS AGE}}
# {{ deploy again }}
{{[rcluster] rchyla@tp1:/dvt/workspace/projects/it$ k apply -f
deployments/cassandra/sts.yaml }}
{{statefulset.apps/cassandra created}}
# {{cassandra is initializing}}
{{[rcluster] rchyla@tp1:/dvt/workspace/projects/it$ k get po}}
{{NAME READY STATUS RESTARTS AGE}}
{{cassandra-0 1/1 Running 0 22s}}
{{cassandra-1 1/1 Running 0 21s}}
{{cassandra-2 1/1 Running 0 20s}}
{{cassandra-3 1/1 Running 0 18s}}
# {{the first failure comes 2m after startup; after 8m one of the nodes
restarted 4 times already (once I have observed it join the cluster overnight,
after maybe 1h - but that was once)}}
{{[rcluster] rchyla@tp1:/dvt/workspace/projects/it$ k get po -o wide}}
{{NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES}}
{{cassandra-0 1/1 Running 0 8m32s 10.96.59.213 rbox1 <none> <none>}}
{{cassandra-1 1/1 Running 0 8m31s 10.96.80.145 rbox2 <none> <none>}}
{{cassandra-2 1/1 Running 0 8m30s 10.96.70.86 rbox3 <none> <none>}}
{{cassandra-3 0/1 Error 4 8m28s 10.96.44.144 rbox4 <none> <none>}}
{{without auto_bootstrap:true}}
{{INFO [main] 2021-08-18 15:21:45,537 RangeStreamer.java:330 - Bootstrap:
range Full(/10.96.44.144:7000,(14089530831047523,784495759500208690]) exists on
Full(/10.96.70.86:7000,(14089530831047523,1554901988169369858]) for keyspace
system_authINFO [main] 2021-08-18 15:21:45,537 RangeStreamer.java:330 -
Bootstrap: range
Full(/10.96.44.144:7000,(14089530831047523,784495759500208690]) exists on
Full(/10.96.70.86:7000,(14089530831047523,1554901988169369858]) for keyspace
system_authException (java.lang.IllegalStateException) encountered during
startup: Multiple strict sources found for
Full(/10.96.44.144:7000,(9015306348701926784,-8983433729137907922]), sources:
[Full(/10.96.59.213:7000,(8567302352832209875,-8983433729137907922]),
Full(/10.96.70.86:7000,(8567302352832209875,-8983433729137907922])]java.lang.IllegalStateException:
Multiple strict sources found for
Full(/10.96.44.144:7000,(9015306348701926784,-8983433729137907922]), sources:
[Full(/10.96.59.213:7000,(8567302352832209875,-8983433729137907922]),
Full(/10.96.70.86:7000,(8567302352832209875,-8983433729137907922])] at
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:542)
at
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:408)
at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:327) at
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83) at
org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:1785)
at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1762)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1056)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1017)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:799)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)ERROR
[main] 2021-08-18 15:21:45,541 CassandraDaemon.java:909 - Exception
encountered during startup}}
> Joining nodes simultaneously with auto_bootstrap:false can cause token
> collision
> --------------------------------------------------------------------------------
>
> Key: CASSANDRA-16364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16364
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Membership
> Reporter: Paulo Motta
> Priority: Normal
> Fix For: 4.0.x
>
>
> While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same
> tokens using the default {{allocate_tokens_for_local_rf}}. However they both
> succeeded bootstrap with colliding tokens.
> We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079,
> and the workaround to fix this is to avoid parallel bootstrap when using
> {{allocate_tokens_for_local_rf}}.
> However, since this is the default behavior, we should try to detect and
> prevent this situation when possible, since it can break users relying on
> parallel bootstrap behavior.
> I think we could prevent this as following:
> 1. announce intent to bootstrap via gossip (ie. add node on gossip without
> token information)
> 2. wait for gossip to settle for a longer period (ie. ring delay)
> 3. allocate tokens (if multiple bootstrap attempts are detected, tie break
> via node-id)
> 4. broadcast tokens and move on with bootstrap
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]