[ 
https://issues.apache.org/jira/browse/CASSANDRA-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588617#comment-15588617
 ] 

Stefan Podkowinski commented on CASSANDRA-12281:
------------------------------------------------

I just took a closer look at the source and stack dump and it also looks to me 
as if the pending ranges calculation is blocking all gossip tasks from 
executing. But this should not only affect a starting node, as the pending 
ranges calculation should take place on every node in the cluster after 
relevant changes in the cluster topology. It should also always take about the 
same amount of time before completion. You should be able to verify this by 
increasing the log level of 
{{org.apache.cassandra.service.PendingRangeCalculatorService}} and look for a 
"finished calculation for" log message with the amount of time it took to 
finish the range calculation. The number of pending GossipStage tasks should 
also increase on each node after the new node starts bootstrapping.

As can be seen in the stack dump, the startup process is stuck in 
{{CassandraDaemon.waitForGossipToSettle()}} where the number of pending tasks 
are checked in assumption that we'll be able to continue starting up once all 
of them have been completed. You should be able to find [related INFO 
messages|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/service/CassandraDaemon.java#L588]
 in the log.

Is this really being the case? Can a bootstrapping node trigger a very slow 
pending ranges calculation on all nodes that would effectively shut down all 
gossip in the cluster?

> Gossip blocks on startup when another node is bootstrapping
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-12281
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12281
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Eric Evans
>            Assignee: Joel Knighton
>            Priority: Minor
>         Attachments: restbase1015-a_jstack.txt
>
>
> In our cluster, normal node startup times (after a drain on shutdown) are 
> less than 1 minute.  However, when another node in the cluster is 
> bootstrapping, the same node startup takes nearly 30 minutes to complete, the 
> apparent result of gossip blocking on pending range calculations.
> {noformat}
> $ nodetool-a tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> MutationStage                     0         0           1840         0        
>          0
> ReadStage                         0         0           2350         0        
>          0
> RequestResponseStage              0         0             53         0        
>          0
> ReadRepairStage                   0         0              1         0        
>          0
> CounterMutationStage              0         0              0         0        
>          0
> HintedHandoff                     0         0             44         0        
>          0
> MiscStage                         0         0              0         0        
>          0
> CompactionExecutor                3         3            395         0        
>          0
> MemtableReclaimMemory             0         0             30         0        
>          0
> PendingRangeCalculator            1         2             29         0        
>          0
> GossipStage                       1      5602            164         0        
>          0
> MigrationStage                    0         0              0         0        
>          0
> MemtablePostFlush                 0         0            111         0        
>          0
> ValidationExecutor                0         0              0         0        
>          0
> Sampler                           0         0              0         0        
>          0
> MemtableFlushWriter               0         0             30         0        
>          0
> InternalResponseStage             0         0              0         0        
>          0
> AntiEntropyStage                  0         0              0         0        
>          0
> CacheCleanupExecutor              0         0              0         0        
>          0
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> MUTATION                     0
> COUNTER_MUTATION             0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
> {noformat}
> A full thread dump is attached, but the relevant bit seems to be here:
> {noformat}
> [ ... ]
> "GossipStage:1" #1801 daemon prio=5 os_prio=0 tid=0x00007fe4cd54b000 
> nid=0xea9 waiting on condition [0x00007fddcf883000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000004c1e922c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>       at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>       at 
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:174)
>       at 
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:160)
>       at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2023)
>       at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1682)
>       at 
> org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
>       at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1165)
>       at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1128)
>       at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
>       at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> [ ... ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to