[
https://issues.apache.org/jira/browse/CASSANDRA-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15601668#comment-15601668
]
Stefan Podkowinski commented on CASSANDRA-12281:
------------------------------------------------
Would you be ok with my approach in the [WIP
branch|https://github.com/apache/cassandra/compare/cassandra-3.0...spodkowinski:WIP-12281],
[~jkni]?
I've implemented two significant changes here:
# TokenMetaData.calculatePendingRanges() will no longer acquire a read lock for
range calculation. This should be fine as long as this method is the only
writer and synchronizes around the ranges map. Without the lock, it's now
possible that range calculation results will be behind the most recent gossip
state, but I think that's still preferred to avoiding updating the gossip
state at all by blocked handlers. Eventually the calculation should catch up.
# The ranges calculation will now only be done once per identical replication
configuration, which should improve performance for clusters with a large
number of keyspaces. It would be nice if [~apeckys] or [~dikanggu] could
confirm if their affected clusters have a larger number of keyspaces or not, so
we get a better idea if this could be a bigger factor at play.
Let me know your comments and I'm going to create patches for 2.2 upwards and
fire the tests in case we're good.
> Gossip blocks on startup when another node is bootstrapping
> -----------------------------------------------------------
>
> Key: CASSANDRA-12281
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12281
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Eric Evans
> Assignee: Stefan Podkowinski
> Attachments: restbase1015-a_jstack.txt
>
>
> In our cluster, normal node startup times (after a drain on shutdown) are
> less than 1 minute. However, when another node in the cluster is
> bootstrapping, the same node startup takes nearly 30 minutes to complete, the
> apparent result of gossip blocking on pending range calculations.
> {noformat}
> $ nodetool-a tpstats
> Pool Name Active Pending Completed Blocked All
> time blocked
> MutationStage 0 0 1840 0
> 0
> ReadStage 0 0 2350 0
> 0
> RequestResponseStage 0 0 53 0
> 0
> ReadRepairStage 0 0 1 0
> 0
> CounterMutationStage 0 0 0 0
> 0
> HintedHandoff 0 0 44 0
> 0
> MiscStage 0 0 0 0
> 0
> CompactionExecutor 3 3 395 0
> 0
> MemtableReclaimMemory 0 0 30 0
> 0
> PendingRangeCalculator 1 2 29 0
> 0
> GossipStage 1 5602 164 0
> 0
> MigrationStage 0 0 0 0
> 0
> MemtablePostFlush 0 0 111 0
> 0
> ValidationExecutor 0 0 0 0
> 0
> Sampler 0 0 0 0
> 0
> MemtableFlushWriter 0 0 30 0
> 0
> InternalResponseStage 0 0 0 0
> 0
> AntiEntropyStage 0 0 0 0
> 0
> CacheCleanupExecutor 0 0 0 0
> 0
> Message type Dropped
> READ 0
> RANGE_SLICE 0
> _TRACE 0
> MUTATION 0
> COUNTER_MUTATION 0
> REQUEST_RESPONSE 0
> PAGED_RANGE 0
> READ_REPAIR 0
> {noformat}
> A full thread dump is attached, but the relevant bit seems to be here:
> {noformat}
> [ ... ]
> "GossipStage:1" #1801 daemon prio=5 os_prio=0 tid=0x00007fe4cd54b000
> nid=0xea9 waiting on condition [0x00007fddcf883000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000004c1e922c0> (a
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:174)
> at
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:160)
> at
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2023)
> at
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1682)
> at
> org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
> at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1165)
> at
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1128)
> at
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> [ ... ]
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)