Eric Evans created CASSANDRA-12281:
--------------------------------------

             Summary: Gossip blocks on startup when another node is 
bootstrapping
                 Key: CASSANDRA-12281
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12281
             Project: Cassandra
          Issue Type: Bug
          Components: Core
            Reporter: Eric Evans
            Priority: Minor
         Attachments: restbase1015-a_jstack.txt

In our cluster, normal node startup times (after a drain on shutdown) are less 
than 1 minute.  However, when another node in the cluster is bootstrapping, the 
same node startup takes nearly 30 minutes to complete, the apparent result of 
gossip blocking on pending range calculations.

{noformat}
$ nodetool-a tpstats
Pool Name                    Active   Pending      Completed   Blocked  All 
time blocked
MutationStage                     0         0           1840         0          
       0
ReadStage                         0         0           2350         0          
       0
RequestResponseStage              0         0             53         0          
       0
ReadRepairStage                   0         0              1         0          
       0
CounterMutationStage              0         0              0         0          
       0
HintedHandoff                     0         0             44         0          
       0
MiscStage                         0         0              0         0          
       0
CompactionExecutor                3         3            395         0          
       0
MemtableReclaimMemory             0         0             30         0          
       0
PendingRangeCalculator            1         2             29         0          
       0
GossipStage                       1      5602            164         0          
       0
MigrationStage                    0         0              0         0          
       0
MemtablePostFlush                 0         0            111         0          
       0
ValidationExecutor                0         0              0         0          
       0
Sampler                           0         0              0         0          
       0
MemtableFlushWriter               0         0             30         0          
       0
InternalResponseStage             0         0              0         0          
       0
AntiEntropyStage                  0         0              0         0          
       0
CacheCleanupExecutor              0         0              0         0          
       0

Message type           Dropped
READ                         0
RANGE_SLICE                  0
_TRACE                       0
MUTATION                     0
COUNTER_MUTATION             0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0
{noformat}

A full thread dump is attached, but the relevant bit seems to be here:

{noformat}
[ ... ]

"GossipStage:1" #1801 daemon prio=5 os_prio=0 tid=0x00007fe4cd54b000 nid=0xea9 
waiting on condition [0x00007fddcf883000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000004c1e922c0> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
        at 
org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:174)
        at 
org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:160)
        at 
org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2023)
        at 
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1682)
        at 
org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
        at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1165)
        at 
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1128)
        at 
org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

[ ... ]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to