[
https://issues.apache.org/jira/browse/CASSANDRA-11363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15199944#comment-15199944
]
Carlos Rolo commented on CASSANDRA-11363:
-----------------------------------------
I can confirm that in both my clusters batches are in use.
> Blocked NTR When Connecting Causing Excessive Load
> --------------------------------------------------
>
> Key: CASSANDRA-11363
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11363
> Project: Cassandra
> Issue Type: Bug
> Components: Coordination
> Reporter: Russell Bradberry
> Attachments: cassandra-102-cms.stack, cassandra-102-g1gc.stack
>
>
> When upgrading from 2.1.9 to 2.1.13, we are witnessing an issue where the
> machine load increases to very high levels (> 120 on an 8 core machine) and
> native transport requests get blocked in tpstats.
> I was able to reproduce this in both CMS and G1GC as well as on JVM 7 and 8.
> The issue does not seem to affect the nodes running 2.1.9.
> The issue seems to coincide with the number of connections OR the number of
> total requests being processed at a given time (as the latter increases with
> the former in our system)
> Currently there is between 600 and 800 client connections on each machine and
> each machine is handling roughly 2000-3000 client requests per second.
> Disabling the binary protocol fixes the issue for this node but isn't a
> viable option cluster-wide.
> Here is the output from tpstats:
> {code}
> Pool Name Active Pending Completed Blocked All
> time blocked
> MutationStage 0 8 8387821 0
> 0
> ReadStage 0 0 355860 0
> 0
> RequestResponseStage 0 7 2532457 0
> 0
> ReadRepairStage 0 0 150 0
> 0
> CounterMutationStage 32 104 897560 0
> 0
> MiscStage 0 0 0 0
> 0
> HintedHandoff 0 0 65 0
> 0
> GossipStage 0 0 2338 0
> 0
> CacheCleanupExecutor 0 0 0 0
> 0
> InternalResponseStage 0 0 0 0
> 0
> CommitLogArchiver 0 0 0 0
> 0
> CompactionExecutor 2 190 474 0
> 0
> ValidationExecutor 0 0 0 0
> 0
> MigrationStage 0 0 10 0
> 0
> AntiEntropyStage 0 0 0 0
> 0
> PendingRangeCalculator 0 0 310 0
> 0
> Sampler 0 0 0 0
> 0
> MemtableFlushWriter 1 10 94 0
> 0
> MemtablePostFlush 1 34 257 0
> 0
> MemtableReclaimMemory 0 0 94 0
> 0
> Native-Transport-Requests 128 156 387957 16
> 278451
> Message type Dropped
> READ 0
> RANGE_SLICE 0
> _TRACE 0
> MUTATION 0
> COUNTER_MUTATION 0
> BINARY 0
> REQUEST_RESPONSE 0
> PAGED_RANGE 0
> READ_REPAIR 0
> {code}
> Attached is the jstack output for both CMS and G1GC.
> Flight recordings are here:
> https://s3.amazonaws.com/simple-logs/cassandra-102-cms.jfr
> https://s3.amazonaws.com/simple-logs/cassandra-102-g1gc.jfr
> It is interesting to note that while the flight recording was taking place,
> the load on the machine went back to healthy, and when the flight recording
> finished the load went back to > 100.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)