[
https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901497#comment-16901497
]
Yifan Cai commented on CASSANDRA-15214:
---------------------------------------
Several experiments of the OOM scenario are made to check if the HotSpot
handlers work as expected, namely kill the process.
The result shows that the handlers, OnOutOfMemoryError and
ExitOnOutOfMemoryError, are only effective for heap OOM.
*Experiments*
The experiments are designed to emulate what happens in C* while being minimal.
They have the Thread.setDefaultUncaughtExceptionHandler installed and just
re-throw the OOM error hoping the handlers can take care.
OpenJDK 8 was used.
You can find all the 5 experiments in the attached [^oom-experiments.zip].
{code:java}
├── OomExperimentExceedsDirectBuffer.java
├── OomExperimentExceedsDirectBufferRapidAlloc.java
├── OomExperimentExceedsHeap.java
├── OomExperimentSimple.java
└── OomExperimentSimpleJustExit.java{code}
Among those experiments, there is only one (OomExperimentExceedsHeap) can
successfully trigger the handlers.
The rest do throw the OutOfMemoryError, but the handlers are not triggered.
*Some Research*
The cause could be due to the difference of the code path in JVM implementation
to allocate memory on heap and for direct buffer. (OpenJDK8 is the reference)
Heap memory allocation happens at
[collectedHeap.inline.hpp#CollectedHeap::common_mem_allocate_noinit|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp#L149].
When it failed, it calls
[report_java_out_of_memory|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/hotspot/src/share/vm/utilities/debug.cpp#L287],
which is responsible to create a heap dump on OOM and run the handlers.
Meanwhile, allocating direct buffer take a different path. In
java.nio.DirectByteBuffer, OOM can happen at 2 places.
1. Bits.reserveMemory, finds out there is not enough direct memory and throws
OOM. In this case, I do not think the OOM is caught and handled in JVM to
trigger report_java_out_of_memory.
2. unsafe.allocateMemory, which calls malloc directly, but [failed to allocate
and throws
OOM|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/hotspot/src/share/vm/prims/unsafe.cpp#L606].
Again, such OOM was throw in order to let the application to handle.
Another proof is that
[report_java_out_of_memory|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/hotspot/src/share/vm/utilities/debug.cpp#L287],
the only place to trigger the handler, was not invoked during
unsafe.allocateMemory. Here are [all the references of the method
invocation|https://github.com/AdoptOpenJDK/openjdk-jdk8u/search?q=report_java_out_of_memory&unscoped_q=report_java_out_of_memory].
Because of that, jvmkill or jvmquake mentioned in the ticket might not work.
The tool replies on the notification of the
[JvmtiExport::post_resource_exhausted|https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/hotspot/src/share/vm/gc_interface/collectedHeap.inline.hpp#L153],
which does not present in the 2 places that direct buffer OOM can happen. Here
is the implementation of
[jvmkill|https://github.com/airlift/jvmkill/blob/master/jvmkill.c#L24] (less
than 100 lines).
> OOMs caught and not rethrown
> ----------------------------
>
> Key: CASSANDRA-15214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15214
> Project: Cassandra
> Issue Type: Bug
> Components: Messaging/Client, Messaging/Internode
> Reporter: Benedict
> Priority: Normal
> Fix For: 4.0
>
> Attachments: oom-experiments.zip
>
>
> Netty (at least, and perhaps elsewhere in Executors) catches all exceptions,
> so presently there is no way to ensure that an OOM reaches the JVM handler to
> trigger a crash/heapdump.
> It may be that the simplest most consistent way to do this would be to have a
> single thread spawned at startup that waits for any exceptions we must
> propagate to the Runtime.
> We could probably submit a patch upstream to Netty, but for a guaranteed
> future proof approach, it may be worth paying the cost of a single thread.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]