[
https://issues.apache.org/jira/browse/CASSANDRA-16364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17401090#comment-17401090
]
Roman commented on CASSANDRA-16364:
-----------------------------------
I have hit the issue running 4.0 (inside k8s, therefore starting 4 instances in
parallel)
It seems that `auto_bootstrap: true` is not a default, as one of the comments
suggests. In my case, without the option, one machine has eventually (after 10
restarts) joined the cluster; but I also observed a situation when a cluster
was up for a day and one of the machines has restarted hundreds of times
(always with a token conflict)
With the `auto_boostrap: true` 4 instances are starting in parallel; and two of
them restart 1-2 times (due to a bootsrap conflict – but that seems to a
separate issue from the one above).
This was the error before `auto_bootsrap:true`
```
{{INFO [main] 2021-08-18 02:38:29,032 NetworkTopologyStrategy.java:88 -
Configured datacenter replicas are datacenter1:rf(2)}}{{INFO [main] 2021-08-18
02:38:29,034 TokenAllocatorFactory.java:44 - Using
ReplicationAwareTokenAllocator.}}{{INFO [main] 2021-08-18 02:38:29,122
TokenAllocation.java:106 - Selected tokens [-869047834665074658,
6571578339392131746, -5974523007943185192, -3644355145115701774,
3287046338630430582, -2401348872989035546, 1849708238101167874,
-4749797269495265510]}}{{INFO [main] 2021-08-18 02:38:29,129
StorageService.java:1619 - JOINING: sleeping 30000 ms for pending range
setup}}{{INFO [main] 2021-08-18 02:38:59,130 StorageService.java:1619 -
JOINING: Starting to bootstrap...}}{{INFO [main] 2021-08-18 02:38:59,147
RangeStreamer.java:330 - Bootstrap: range
Full(/10.96.70.81:7000,(5801172110722970579,6571578339392131746]) exists on
Full(/10.96.44.142:7000,(5801172110722970579,7341984568061292914]) for keyspace
system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 -
Bootstrap: range
Full(/10.96.70.81:7000,(-4092359140985418682,-3644355145115701774]) exists on
Full(/10.96.59.211:7000,(-4092359140985418682,-3196351149245984865]) for
keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148
RangeStreamer.java:330 - Bootstrap: range
Full(/10.96.70.81:7000,(-3196351149245984865,-2401348872989035546]) exists on
Full(/10.96.59.211:7000,(-3196351149245984865,-1606346596732086227]) for
keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148
RangeStreamer.java:330 - Bootstrap: range
Full(/10.96.70.81:7000,(990822151481071145,1849708238101167874]) exists on
Full(/10.96.44.142:7000,(990822151481071145,2708594324721264603]) for keyspace
system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 -
Bootstrap: range
Full(/10.96.70.81:7000,(-1606346596732086227,-869047834665074658]) exists on
Full(/10.96.44.142:7000,(-1606346596732086227,-131749072598063088]) for
keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148
RangeStreamer.java:330 - Bootstrap: range
Full(/10.96.70.81:7000,(-6541810617881258046,-5974523007943185192]) exists on
Full(/10.96.59.211:7000,(-6541810617881258046,-5407235398005112337]) for
keyspace system_auth}}{{INFO [main] 2021-08-18 02:38:59,148
RangeStreamer.java:330 - Bootstrap: range
Full(/10.96.70.81:7000,(2708594324721264603,3287046338630430582]) exists on
Full(/10.96.59.211:7000,(2708594324721264603,3865498352539596562]) for keyspace
system_auth}}{{INFO [main] 2021-08-18 02:38:59,148 RangeStreamer.java:330 -
Bootstrap: range
Full(/10.96.70.81:7000,(-5407235398005112337,-4749797269495265510]) exists on
Full(/10.96.44.142:7000,(-5407235398005112337,-4092359140985418682]) for
keyspace system_auth}}{{java.lang.IllegalStateException: Multiple strict
sources found for
Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]), sources:
[Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]),
Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{at
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:542)}}{{at
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:408)}}{{at
org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:327)}}{{at
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)}}{{at
org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:1785)}}{{at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1762)}}{{at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1056)}}{{at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1017)}}{{at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:799)}}{{at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)}}{{at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)}}{{at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)}}{{at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)}}{{Exception
(java.lang.IllegalStateException) encountered during startup: Multiple strict
sources found for
Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]), sources:
[Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]),
Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{ERROR
[main] 2021-08-18 02:38:59,153 CassandraDaemon.java:909 - Exception encountered
during startup}}{{java.lang.IllegalStateException: Multiple strict sources
found for Full(/10.96.70.81:7000,(8312940956965586630,-9117317883097463910]),
sources: [Full(/10.96.44.142:7000,(8312940956965586630,-9117317883097463910]),
Full(/10.96.59.211:7000,(8312940956965586630,-9117317883097463910])]}}{{at
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:542)}}{{at
org.apache.cassandra.dht.RangeStreamer.calculateRangesToFetchWithPreferredEndpoints(RangeStreamer.java:408)}}{{at
org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:327)}}{{at
org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:83)}}{{at
org.apache.cassandra.service.StorageService.startBootstrap(StorageService.java:1785)}}{{at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1762)}}{{at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1056)}}{{at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:1017)}}{{at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:799)}}{{at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:729)}}{{at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:420)}}{{at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:763)}}{{at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:887)}}{{INFO
[StorageServiceShutdownHook] 2021-08-18 02:38:59,224 HintsService.java:220 -
Paused hints dispatch}}
```
(after which, the cassandra pod will restart)
> Joining nodes simultaneously with auto_bootstrap:false can cause token
> collision
> --------------------------------------------------------------------------------
>
> Key: CASSANDRA-16364
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16364
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Membership
> Reporter: Paulo Motta
> Priority: Normal
> Fix For: 4.0.x
>
>
> While raising a 6-node ccm cluster to test 4.0-beta4, 2 nodes chosen the same
> tokens using the default {{allocate_tokens_for_local_rf}}. However they both
> succeeded bootstrap with colliding tokens.
> We were familiar with this issue from CASSANDRA-13701 and CASSANDRA-16079,
> and the workaround to fix this is to avoid parallel bootstrap when using
> {{allocate_tokens_for_local_rf}}.
> However, since this is the default behavior, we should try to detect and
> prevent this situation when possible, since it can break users relying on
> parallel bootstrap behavior.
> I think we could prevent this as following:
> 1. announce intent to bootstrap via gossip (ie. add node on gossip without
> token information)
> 2. wait for gossip to settle for a longer period (ie. ring delay)
> 3. allocate tokens (if multiple bootstrap attempts are detected, tie break
> via node-id)
> 4. broadcast tokens and move on with bootstrap
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]