[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

Benedict (JIRA) Wed, 10 Jul 2019 00:02:44 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881781#comment-16881781
 ]


Benedict commented on CASSANDRA-15013:
--------------------------------------

Thanks for these [~sumanth.pasupuleti]!

Just to log for watchers, I have had a brief chat with Sumanth, and we intend 
to capture flame graphs to see if we can explain the 10% (5 percentage point) 
bump in average CPU utilisation, which may well be down to competition on a 
single variable for every operation.  This is a worst case cost, given the 
formulation of this test, which was the whole point - but it's potentially 
still significant, so we might need to reduce friction by e.g. assigning each 
connection its own share of the pie at connection, so that we only have to 
compete for the shared resource infrequently (when we overshot our share, or 
need to dis/connect).  We'll see what the flame graphs show.

We will also try to explain the different shape of heap utilisation graph - 
which might be as simple as only one node is coordinating instead of all three, 
for instance.

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client
>            Reporter: Sumanth Pasupuleti
>            Assignee: Sumanth Pasupuleti
>            Priority: Normal
>              Labels: pull-request-available
>             Fix For: 4.0, 3.0.x, 3.11.x
>
>         Attachments: BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png, 
> perftest_blockedthreads.png, perftest_connections_count.png, 
> perftest_cpu_usage.png, perftest_heap_usage.png, 
> perftest_readlatency_99th.png, perftest_readlatency_avg.png, 
> perftest_readops.png, perftest_writelatency_99th.png, 
> perftest_writelatency_avg.png, perftest_writeops.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

Reply via email to