[ 
https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198034#comment-17198034
 ] 

Yifan Cai commented on CASSANDRA-15214:
---------------------------------------

> Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, 
> so presently there is no way to ensure that an OOM reaches the JVM handler to 
> trigger a crash/heapdump.

Running a code inspection, the exception/throwable from Netty is already 
handled. 
For inbound, the {{InboundMessageHandler}} implements {{exceptionCaught()}} 
which invokes {{JVMStabilityInspector}}. The message handler is the last one in 
the inbound direction, and there is no previous handler that handles 
exceptions. So the message handler should handle all exceptions from that 
direction. However, the {{exceptionCaught()}} override in 
{{StreamingInboundHandler}} does not invoke  {{JVMStabilityInspector}}. It 
could swallow OOM errors. 
For outbound, {{JVMStabilityInspector}} is invoked when the channel future 
fails, and several other places. 

All the above callsites call {{JVMStabilityInspector}} with 
{{propagateOutOfMemory}} disabled. So the inspector just swallows the OOM 
errors and not let JVM to handle. [~benedict], what is the reason for doing so 
in the inbound/outbound connections? 

> OOMs caught and not rethrown
> ----------------------------
>
>                 Key: CASSANDRA-15214
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15214
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Messaging/Client, Messaging/Internode
>            Reporter: Benedict Elliott Smith
>            Assignee: Yifan Cai
>            Priority: Normal
>             Fix For: 4.0, 4.0-rc
>
>         Attachments: oom-experiments.zip
>
>
> Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, 
> so presently there is no way to ensure that an OOM reaches the JVM handler to 
> trigger a crash/heapdump.
> It may be that the simplest most consistent way to do this would be to have a 
> single thread spawned at startup that waits for any exceptions we must 
> propagate to the Runtime.
> We could probably submit a patch upstream to Netty, but for a guaranteed 
> future proof approach, it may be worth paying the cost of a single thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to