[
https://issues.apache.org/jira/browse/CASSANDRA-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588617#comment-15588617
]
Stefan Podkowinski commented on CASSANDRA-12281:
------------------------------------------------
I just took a closer look at the source and stack dump and it also looks to me
as if the pending ranges calculation is blocking all gossip tasks from
executing. But this should not only affect a starting node, as the pending
ranges calculation should take place on every node in the cluster after
relevant changes in the cluster topology. It should also always take about the
same amount of time before completion. You should be able to verify this by
increasing the log level of
{{org.apache.cassandra.service.PendingRangeCalculatorService}} and look for a
"finished calculation for" log message with the amount of time it took to
finish the range calculation. The number of pending GossipStage tasks should
also increase on each node after the new node starts bootstrapping.
As can be seen in the stack dump, the startup process is stuck in
{{CassandraDaemon.waitForGossipToSettle()}} where the number of pending tasks
are checked in assumption that we'll be able to continue starting up once all
of them have been completed. You should be able to find [related INFO
messages|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/service/CassandraDaemon.java#L588]
in the log.
Is this really being the case? Can a bootstrapping node trigger a very slow
pending ranges calculation on all nodes that would effectively shut down all
gossip in the cluster?
> Gossip blocks on startup when another node is bootstrapping
> -----------------------------------------------------------
>
> Key: CASSANDRA-12281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12281
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Eric Evans
> Assignee: Joel Knighton
> Priority: Minor
> Attachments: restbase1015-a_jstack.txt
>
>
> In our cluster, normal node startup times (after a drain on shutdown) are
> less than 1 minute. However, when another node in the cluster is
> bootstrapping, the same node startup takes nearly 30 minutes to complete, the
> apparent result of gossip blocking on pending range calculations.
> {noformat}
> $ nodetool-a tpstats
> Pool Name Active Pending Completed Blocked All
> time blocked
> MutationStage 0 0 1840 0
> 0
> ReadStage 0 0 2350 0
> 0
> RequestResponseStage 0 0 53 0
> 0
> ReadRepairStage 0 0 1 0
> 0
> CounterMutationStage 0 0 0 0
> 0
> HintedHandoff 0 0 44 0
> 0
> MiscStage 0 0 0 0
> 0
> CompactionExecutor 3 3 395 0
> 0
> MemtableReclaimMemory 0 0 30 0
> 0
> PendingRangeCalculator 1 2 29 0
> 0
> GossipStage 1 5602 164 0
> 0
> MigrationStage 0 0 0 0
> 0
> MemtablePostFlush 0 0 111 0
> 0
> ValidationExecutor 0 0 0 0
> 0
> Sampler 0 0 0 0
> 0
> MemtableFlushWriter 0 0 30 0
> 0
> InternalResponseStage 0 0 0 0
> 0
> AntiEntropyStage 0 0 0 0
> 0
> CacheCleanupExecutor 0 0 0 0
> 0
> Message type Dropped
> READ 0
> RANGE_SLICE 0
> _TRACE 0
> MUTATION 0
> COUNTER_MUTATION 0
> REQUEST_RESPONSE 0
> PAGED_RANGE 0
> READ_REPAIR 0
> {noformat}
> A full thread dump is attached, but the relevant bit seems to be here:
> {noformat}
> [ ... ]
> "GossipStage:1" #1801 daemon prio=5 os_prio=0 tid=0x00007fe4cd54b000
> nid=0xea9 waiting on condition [0x00007fddcf883000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000004c1e922c0> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:174)
> at
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:160)
> at
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2023)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1682)
> at
> org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
> at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1165)
> at
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1128)
> at
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> [ ... ]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)