[
https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16634851#comment-16634851
]
Joseph Lynch edited comment on CASSANDRA-14747 at 10/2/18 2:25 AM:
-------------------------------------------------------------------
Ah yea I see that's a problem. I worked around it by making a new callback just
for that case. While I was testing it out I also tested flushing
unconditionally
([diff|https://gist.github.com/jolynch/966e0e52f34eff7a7b8ac8d5a9cb4b5d#file-some-more-tweaks-diff-L22])
and CPU usage dropped by about half and the flamegraph looks _excellent_.
I've attached the flamegraph as [^4.0.12-after-unconditional-flush.svg], where
we can see that after the unconditional flush we are spending less than 7% CPU
usage now! (compared to like 70%). I think that with 198 other nodes we were
spending a lot of time waiting with data in the channel that's unflushed
because well there are 195 other queues that get to be serviced before you get
serviced again and fill up the channel.
We're not done yet as we still have dropped messages (vs 3.0 which has very few
if any dropped), but this is much better.
was (Author: jolynch):
Ah yea I see that's a problem. I worked around it by making a new callback just
for that case. While I was testing it out I also tested flushing
unconditionally
[https://gist.github.com/jolynch/966e0e52f34eff7a7b8ac8d5a9cb4b5d#file-some-more-tweaks-diff,]
and CPU usage dropped by about half and the flamegraph looks _excellent_.
I've attached the flamegraph as [^4.0.12-after-unconditional-flush.svg], where
we can see that after the unconditional flush we are spending less than 7% CPU
usage now! (compared to like 70%). I think that with 198 other nodes we were
spending a lot of time waiting with data in the channel that's unflushed
because well there are 195 other queues that get to be serviced before you get
serviced again and fill up the channel.
We're not done yet as we still have dropped messages (vs 3.0 which has very few
if any dropped), but this is much better.
> Evaluate 200 node, compression=none, encryption=none, coalescing=off
> ---------------------------------------------------------------------
>
> Key: CASSANDRA-14747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14747
> Project: Cassandra
> Issue Type: Sub-task
> Reporter: Joseph Lynch
> Assignee: Joseph Lynch
> Priority: Major
> Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png,
> 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg,
> 4.0.7-before-my-changes.svg, 4.0_errors_showing_heap_pressure.txt,
> 4.0_heap_histogram_showing_many_MessageOuts.txt,
> i-0ed2acd2dfacab7c1-after-looping-fixes.svg,
> ttop_NettyOutbound-Thread_spinning.txt,
> useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg,
> useast1e-i-08635fa1631601538_flamegraph_96node.svg,
> useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes,
> useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg
>
>
> Tracks evaluating a 200 node cluster with all internode settings off (no
> compression, no encryption, no coalescing).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]