[ 
https://issues.apache.org/jira/browse/CASSANDRA-12281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15637528#comment-15637528
 ] 

Joel Knighton commented on CASSANDRA-12281:
-------------------------------------------

Thanks for the patch and your patience as I get to this for review! I've been 
quite busy lately.

The approach overall seems sound. While calculating pending ranges can be a 
little slow, I don't think we risk falling too far behind, because the huge 
delays here appear to be a result of cascading delays to other tasks. The 
PendingRangeCalculatorService's restriction on one queued task that will 
reflect cluster state at time of execution helps with this.

A few small questions/nits:
- Is there a reason that the test is excluded from the 2.2 branch? Byteman is 
available for tests on the 2.2 branch since [CASSANDRA-12377], and I don't see 
anything else that stops the test from being useful there.
- Generally, the tests are organized as a top-level class for some entity or 
fundamental operation in the codebase and then specific test methods for unit 
tests/regression tests. I think it would make sense to establish a 
{{PendingRangeCalculatorServiceTest}} and introduce the specific test for 
[CASSANDRA-12281] inside that class.
- In the {{PendingRangeCalculatorService}}, I'm not sure we need to move the 
"Finished calculation for ..." log message to trace. Most Gossip/TokenMetadata 
state changes are logged at debug, especially when they reflect some detail 
about the aggregate state of an operation.
- A few minor spelling fixes in the test "aquire" -> "acquire", "fist" -> 
"first". (Note that I normally wouldn't bother with these, but since the test 
could likely use a few other changes, I think it is worthwhile to fix these.)
- In the test's setUp, the call to {{Keyspace.setInitialized}} is redundant. 
The call to {{SchemaLoader.prepareServer}} will already perform this.
- CI looks good overall. The 3.0-dtest run has a few materialized view dtest 
failures that are likely unrelated, but it would be good if you could retrigger 
CI for at least this branch.
- There's no CI/branch posted for the 3.X series. While this has barely 
diverged from trunk at this point, it'd be nice if you could run CI for this 
branch.

Thanks again.

> Gossip blocks on startup when another node is bootstrapping
> -----------------------------------------------------------
>
>                 Key: CASSANDRA-12281
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12281
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Eric Evans
>            Assignee: Stefan Podkowinski
>         Attachments: 12281-2.2.patch, 12281-3.0.patch, 12281-trunk.patch, 
> restbase1015-a_jstack.txt
>
>
> In our cluster, normal node startup times (after a drain on shutdown) are 
> less than 1 minute.  However, when another node in the cluster is 
> bootstrapping, the same node startup takes nearly 30 minutes to complete, the 
> apparent result of gossip blocking on pending range calculations.
> {noformat}
> $ nodetool-a tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> MutationStage                     0         0           1840         0        
>          0
> ReadStage                         0         0           2350         0        
>          0
> RequestResponseStage              0         0             53         0        
>          0
> ReadRepairStage                   0         0              1         0        
>          0
> CounterMutationStage              0         0              0         0        
>          0
> HintedHandoff                     0         0             44         0        
>          0
> MiscStage                         0         0              0         0        
>          0
> CompactionExecutor                3         3            395         0        
>          0
> MemtableReclaimMemory             0         0             30         0        
>          0
> PendingRangeCalculator            1         2             29         0        
>          0
> GossipStage                       1      5602            164         0        
>          0
> MigrationStage                    0         0              0         0        
>          0
> MemtablePostFlush                 0         0            111         0        
>          0
> ValidationExecutor                0         0              0         0        
>          0
> Sampler                           0         0              0         0        
>          0
> MemtableFlushWriter               0         0             30         0        
>          0
> InternalResponseStage             0         0              0         0        
>          0
> AntiEntropyStage                  0         0              0         0        
>          0
> CacheCleanupExecutor              0         0              0         0        
>          0
> Message type           Dropped
> READ                         0
> RANGE_SLICE                  0
> _TRACE                       0
> MUTATION                     0
> COUNTER_MUTATION             0
> REQUEST_RESPONSE             0
> PAGED_RANGE                  0
> READ_REPAIR                  0
> {noformat}
> A full thread dump is attached, but the relevant bit seems to be here:
> {noformat}
> [ ... ]
> "GossipStage:1" #1801 daemon prio=5 os_prio=0 tid=0x00007fe4cd54b000 
> nid=0xea9 waiting on condition [0x00007fddcf883000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x00000004c1e922c0> (a 
> java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
>       at 
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
>       at 
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:174)
>       at 
> org.apache.cassandra.locator.TokenMetadata.updateNormalTokens(TokenMetadata.java:160)
>       at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2023)
>       at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1682)
>       at 
> org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1182)
>       at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1165)
>       at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1128)
>       at 
> org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:58)
>       at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> [ ... ]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to