[
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881762#comment-16881762
]
Sumanth Pasupuleti edited comment on CASSANDRA-15013 at 7/10/19 6:34 AM:
-------------------------------------------------------------------------
Performance tests were run against two C* clusters, one running latest trunk
(referred to as _cass_perf_15013_base_), and one running [latest trunk + 15013
patch] (referred to as _cass_perf_15013_patch_). Two NDBench clusters, with
similar configuration to emit similar traffic, were setup to throw load at each
of the C* clusters. Each of the C* clusters is a single region, six i3.8xl
nodes, and each of the NDBench clusters is 450 nodes.
Following is the analysis of the perf run:
# No blocked threadpool in patch, vs blocked threadpool in trunk
!perftest_blockedthreads.png!
# Similar writeops
!perftest_writeops.png!
# Patch does more readops vs trunk
!perftest_readops.png!
# Comparable read and write latencies (99th and avg)
!perftest_readlatency_99th.png!
!perftest_readlatency_avg.png!
!perftest_writelatency_99th.png!
!perftest_writelatency_avg.png!
# Comparable CPU usage
!perftest_cpu_usage.png!
# Comparable heap usage
!perftest_heap_usage.png!
# Connections count (~1000 connections per C* node)
!perftest_connections_count.png!
was (Author: sumanth.pasupuleti):
Performance tests were run against two C* clusters, one running latest trunk,
and one running (latest trunk + 15013 patch). Two NDBench clusters, with
similar configuration to emit similar traffic, were setup to throw load at each
of the C* clusters. Each of the C* clusters is a single region, six i3.8xl
nodes, and each of the NDBench clusters is 450 nodes.
Following is the analysis of the perf run:
# No blocked threadpool in patch, vs blocked threadpool in trunk
!perftest_blockedthreads.png!
# Similar writeops
!perftest_writeops.png!
# Patch does more readops vs trunk
!perftest_readops.png!
# Comparable read and write latencies (99th and avg)
!perftest_readlatency_99th.png!
!perftest_readlatency_avg.png!
!perftest_writelatency_99th.png!
!perftest_writelatency_avg.png!
# Comparable CPU usage
!perftest_cpu_usage.png!
# Comparable heap usage
!perftest_heap_usage.png!
# Connections count (~1000 connections per C* node)
!perftest_connections_count.png!
> Message Flusher queue can grow unbounded, potentially running JVM out of
> memory
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
> Issue Type: Bug
> Components: Messaging/Client
> Reporter: Sumanth Pasupuleti
> Assignee: Sumanth Pasupuleti
> Priority: Normal
> Labels: pull-request-available
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: BlockedEpollEventLoopFromHeapDump.png,
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap
> dump showing each ImmediateFlusher taking upto 600MB.png,
> perftest_blockedthreads.png, perftest_connections_count.png,
> perftest_cpu_usage.png, perftest_heap_usage.png,
> perftest_readlatency_99th.png, perftest_readlatency_avg.png,
> perftest_readops.png, perftest_writelatency_99th.png,
> perftest_writelatency_avg.png, perftest_writeops.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue
> bounded, since, in the current state, items get added to the queue without
> any checks on queue size, nor with any checks on netty outbound buffer to
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]