[ 
https://issues.apache.org/jira/browse/CASSANDRA-15013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768865#comment-16768865
 ] 

Sumanth Pasupuleti edited comment on CASSANDRA-15013 at 2/15/19 2:14 AM:
-------------------------------------------------------------------------

[~benedict] Your theory seems to be spot on (I have all the evidence supporting 
it from the heap dumps and thread dumps now).
 * Evidence of requestExecutor queue full (indicated by taskPermit), and all 
128 workers busy (indicated by workPermit)
 !RequestExecutorQueueFull.png|thumbnail! 
 * Evidence of blocked epollEventLoopGroup threads (from heap)
 !BlockedEpollEventLoopFromHeapDump.png! 
 * Evidence of blocked epollEventLoopGroup threads (from thread dump)
 !BlockedEpollEventLoopFromThreadDump.png! 

 


was (Author: sumanth.pasupuleti):
[~benedict] Your theory seems to be spot on (I have all the evidence supporting 
it from the heap dumps and thread dumps now).
 * Evidence of requestExecutor queue full (indicated by taskPermit), and all 
128 workers busy (indicated by workPermit)
 !RequestExecutorQueueFull.png! 
 * Evidence of blocked epollEventLoopGroup threads (from heap)
 !BlockedEpollEventLoopFromHeapDump.png! 
 * Evidence of blocked epollEventLoopGroup threads (from thread dump)
 !BlockedEpollEventLoopFromThreadDump.png! 

 

> Message Flusher queue can grow unbounded, potentially running JVM out of 
> memory
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15013
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15013
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client
>            Reporter: Sumanth Pasupuleti
>            Assignee: Sumanth Pasupuleti
>            Priority: Major
>             Fix For: 4.0, 3.0.x, 3.11.x
>
>         Attachments: BlockedEpollEventLoopFromHeapDump.png, 
> BlockedEpollEventLoopFromThreadDump.png, RequestExecutorQueueFull.png, heap 
> dump showing each ImmediateFlusher taking upto 600MB.png, 
> image-2019-02-14-17-59-50-794.png
>
>
> This is a follow-up ticket out of CASSANDRA-14855, to make the Flusher queue 
> bounded, since, in the current state, items get added to the queue without 
> any checks on queue size, nor with any checks on netty outbound buffer to 
> check the isWritable state.
> We are seeing this issue hit our production 3.0 clusters quite often.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to