[jira] [Comment Edited] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

Benedict (JIRA) Tue, 16 Apr 2019 10:45:45 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819331#comment-16819331
 ]


Benedict edited comment on CASSANDRA-15013 at 4/16/19 5:44 PM:
---------------------------------------------------------------

[~sumanth.pasupuleti] thanks for the update.  As to addressing the slow leak, I 
would propose the following:

* Instead of a {{Map<InetAddress, AtomicLong>}} in {{Dispatcher}} have a 
{{Map<InetAddress, Dispatcher>}} in {{Server.Initializer}}
* Add a reference count to the {{Dispatcher}} as well as an atomic 
{{bytesInFlight}}
* In {{Server.Initializer.initChannel}}, lookup the socket InetAddress:
*# If there is no {{Dispatcher}} create one 
*# If the {{Dispatcher}} cannot increment its reference count, remove it from 
the map and goto (1)
*# Otherwise we've taken ownership of the {{Dispatcher}} and can use it
* Then, in the {{Dispatcher}}, override {{channelInactive}} to decrement our 
reference count and remove ourselves from the map if we've been freed

This also marginally reduces the per-message cost of enforcing these 
constraints.  WDYT?


was (Author: benedict):
[~sumanth.pasupuleti] thanks for the update.  As to addressing the slow leak, 
there are a number of similar approaches, but I would propose the following:

* Instead of a {{Map<InetAddress, AtomicLong>}} in {{Dispatcher}} have a 
{{Map<InetAddress, Dispatcher>}} in {{Server.Initializer}}
* Add a reference count to the {{Dispatcher}} as well as an atomic 
{{bytesInFlight}}
* In {{Server.Initializer.initChannel}}, lookup the socket InetAddress:
*# If there is no {{Dispatcher}} create one 
*# If the {{Dispatcher}} cannot increment its reference count, remove it from 
the map and goto (1)
*# Otherwise we've taken ownership of the {{Dispatcher}} and can use it
* Then, in the {{Dispatcher}}, override {{channelInactive}} to decrement our 
reference count and remove ourselves from the map if we've been freed

This also marginally reduces the per-message cost of enforcing these 
constraints.  WDYT?

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client
>            Reporter: Sumanth Pasupuleti
>            Assignee: Sumanth Pasupuleti
>            Priority: Normal
>              Labels: pull-request-available
>             Fix For: 4.0, 3.0.x, 3.11.x
>
>         Attachments: BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-15013) Message Flusher queue can grow unbounded, potentially running JVM out of memory

Reply via email to