[
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16763453#comment-16763453
]
Benedict commented on CASSANDRA-15013:
--------------------------------------
Have you tested that this approach resolves your issues?
There's a deadlock that could occur with this change, as the request executor
is also blocking, so the Netty event loop could block for room on the request
executor, and the request executor could block on queueing to the Flusher (that
will be executed on the eventLoop).
Probably we should be disabling reads from the inbound channel during overflow,
in both cases, rather than blocking either the eventLoop or the
requestExecutor. The behaviour of blocking the eventLoop could also be the
cause of your flusher queue growing so large.
> Message Flusher queue can grow unbounded, potentially running JVM out of
> memory
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-15013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
> Project: Cassandra
> Issue Type: Bug
> Components: Messaging/Client
> Reporter: Sumanth Pasupuleti
> Assignee: Sumanth Pasupuleti
> Priority: Major
> Fix For: 4.0, 3.0.x, 3.11.x
>
> Attachments: heap dump showing each ImmediateFlusher taking upto
> 600MB.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue
> bounded, since, in the current state, items get added to the queue without
> any checks on queue size, nor with any checks on netty outbound buffer to
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]