[
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882103#comment-16882103
]
Sumanth Pasupuleti edited comment on CASSANDRA-15013 at 7/10/19 2:19 PM:
-------------------------------------------------------------------------
Thanks [~benedict] and [~jjirsa]. I've re-run the perf test such that
throughput is same across both the clusters (I had to tone down the ndbench
client pointing to patch version of C* by quite a lot to make it equal to trunk
throughput).
I have attached the flamegraphs - CPU usage is a tad lower in patch vs trunk
(based on avg).
Also attached all the metrics of this perf run (files starting with perftest2*).
Following is the summary of perf run #2
* Very similar readops and write ops
* Read latency (99th and avg) slightly better for patch vs trunk
* Write latency 99th similar between patch and trunk. Write latency avg is
slightly better for patch vs trunk
* No blocked threadpool for patch
* Cpu usage (avg) is slightly better for patch vs trunk
* Heap usage pattern was similar between patch and trunk
was (Author: sumanth.pasupuleti):
Thanks [~benedict] and [~jjirsa]. I've re-run the perf test such that
throughput is same across both the clusters (I had to tone down the ndbench
client pointing to patch version of C* by quite a lot) to make it equal to
trunk throughput.
I have attached the flamegraphs - CPU usage is a tad lower in patch vs trunk
(based on avg).
Also attached all the metrics of this perf run (files starting with perftest2*).
Following is the summary of perf run #2
* Very similar readops and write ops
* Read latency (99th and avg) slightly better for patch vs trunk
* Write latency 99th similar between patch and trunk. Write latency avg is
slightly better for patch vs trunk
* No blocked threadpool for patch
* Cpu usage (avg) is slightly better for patch vs trunk
* Heap usage pattern was similar between patch and trunk
> Message Flusher queue can grow unbounded, potentially running JVM out of
> memory
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
> Issue Type: Bug
> Components: Messaging/Client
> Reporter: Sumanth Pasupuleti
> Assignee: Sumanth Pasupuleti
> Priority: Normal
> Labels: pull-request-available
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: BlockedEpollEventLoopFromHeapDump.png,
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap
> dump showing each ImmediateFlusher taking upto 600MB.png,
> perftest2_15013_base_flamegraph.svg, perftest2_15013_patch_flamegraph.svg,
> perftest2_blocked_threadpool.png, perftest2_cpu_usage.png,
> perftest2_heap.png, perftest2_read_latency_99th.png,
> perftest2_read_latency_avg.png, perftest2_readops.png,
> perftest2_write_latency_99th.png, perftest2_write_latency_avg.png,
> perftest2_writeops.png, perftest_blockedthreads.png,
> perftest_connections_count.png, perftest_cpu_usage.png,
> perftest_heap_usage.png, perftest_readlatency_99th.png,
> perftest_readlatency_avg.png, perftest_readops.png,
> perftest_writelatency_99th.png, perftest_writelatency_avg.png,
> perftest_writeops.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue
> bounded, since, in the current state, items get added to the queue without
> any checks on queue size, nor with any checks on netty outbound buffer to
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]