[ 
https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16634301#comment-16634301
 ] 

Jason Brown commented on CASSANDRA-14747:
-----------------------------------------

[~jolynch] Nice work. I agree the time bounding of dequeueMessages is somewhat 
questionable - I added it in when we were making a bunch of other changes for 
dealing with CPU/task starvation. 

In your gist, I think we can run into some serious overscheduling 
(re-enqueueing of the consumer task) when the channel is unwritable. In that 
case, it will break out of dequeueMessages's while loop immediately, but then 
immediately reschedule (assuming backlog > 0).  We'll keep doing this, very 
aggressively, until the channel becomes writable again - yet we cannot make any 
meaningful progress. To counteract this, that's why I had dequeueMessages not 
reschedule, but instead had handleMessageResult reschedule because at that 
point (remember, we only attach the listener to that last message of the bunch) 
we know the bytes have been written to the socket and that channel should be 
writable again. In this case we only schedule (or directly execute) 
dequeueMessages when we need to. (Note: this was probably not apparent from the 
current code's comments, so I should definitely improve that.)


> Evaluate 200 node, compression=none, encryption=none, coalescing=off 
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-14747
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14747
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Joseph Lynch
>            Assignee: Joseph Lynch
>            Priority: Major
>         Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, 
> 4.0.11-after-jolynch-tweaks.svg, 4.0.7-before-my-changes.svg, 
> 4.0_errors_showing_heap_pressure.txt, 
> 4.0_heap_histogram_showing_many_MessageOuts.txt, 
> i-0ed2acd2dfacab7c1-after-looping-fixes.svg, 
> ttop_NettyOutbound-Thread_spinning.txt, 
> useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, 
> useast1e-i-08635fa1631601538_flamegraph_96node.svg, 
> useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, 
> useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg
>
>
> Tracks evaluating a 200 node cluster with all internode settings off (no 
> compression, no encryption, no coalescing).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to