[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16634851#comment-16634851 ]
Joseph Lynch edited comment on CASSANDRA-14747 at 10/2/18 2:25 AM: ------------------------------------------------------------------- Ah yea I see that's a problem. I worked around it by making a new callback just for that case. While I was testing it out I also tested flushing unconditionally ([diff|https://gist.github.com/jolynch/966e0e52f34eff7a7b8ac8d5a9cb4b5d#file-some-more-tweaks-diff-L22]) and CPU usage dropped by about half and the flamegraph looks _excellent_. I've attached the flamegraph as [^4.0.12-after-unconditional-flush.svg], where we can see that after the unconditional flush we are spending less than 7% CPU usage now! (compared to like 70%). I think that with 198 other nodes we were spending a lot of time waiting with data in the channel that's unflushed because well there are 195 other queues that get to be serviced before you get serviced again and fill up the channel. We're not done yet as we still have dropped messages (vs 3.0 which has very few if any dropped), but this is much better. was (Author: jolynch): Ah yea I see that's a problem. I worked around it by making a new callback just for that case. While I was testing it out I also tested flushing unconditionally [https://gist.github.com/jolynch/966e0e52f34eff7a7b8ac8d5a9cb4b5d#file-some-more-tweaks-diff,] and CPU usage dropped by about half and the flamegraph looks _excellent_. I've attached the flamegraph as [^4.0.12-after-unconditional-flush.svg], where we can see that after the unconditional flush we are spending less than 7% CPU usage now! (compared to like 70%). I think that with 198 other nodes we were spending a lot of time waiting with data in the channel that's unflushed because well there are 195 other queues that get to be serviced before you get serviced again and fill up the channel. We're not done yet as we still have dropped messages (vs 3.0 which has very few if any dropped), but this is much better. > Evaluate 200 node, compression=none, encryption=none, coalescing=off > --------------------------------------------------------------------- > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task > Reporter: Joseph Lynch > Assignee: Joseph Lynch > Priority: Major > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, > 4.0.7-before-my-changes.svg, 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > i-0ed2acd2dfacab7c1-after-looping-fixes.svg, > ttop_NettyOutbound-Thread_spinning.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, > useast1e-i-08635fa1631601538_flamegraph_96node.svg, > useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, > useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org