[jira] [Commented] (CASSANDRA-13886) OOM put node in limbo

Tommy Stendahl (JIRA) Tue, 26 Sep 2017 04:47:16 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-13886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16180640#comment-16180640
 ]


Tommy Stendahl commented on CASSANDRA-13886:
--------------------------------------------

I have done some work on this issue, even if it happens very seldomly its very 
bad when it happens. Since the JVM doesn’t die properly our monitoring system 
doesn’t restart Cassandra on this node, it requires a manual intervention. The 
work around with {{-XX:+ExitOnOutOfMemoryError}} works fine, and you can also 
use {{-XX:+CrashOnOutOfMemoryError}} if you want core dumps. But as I 
understand these options are only available from java 8u92 so they might not be 
an option for every one. I think an alternative is to improve 
{{HeapUtils.generateHeapDump()}} so we catch {{Throwable}} so we prevent any 
exceptions from leaking out from {{HeapUtils.generateHeapDump()}}, this would 
allow execution to continue in {{JVMStabilityInspector.inspectThrowable()}} 
until we reach {{killer.killCurrentJVM(t)}} that will properly kill the jvm.
I have prepared a patch for this on the 2.2 branch but it should merge fine to 
all branches.

> OOM put node in limbo
> ---------------------
>
>                 Key: CASSANDRA-13886
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13886
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra version 2.2.10
>            Reporter: Marcus Olsson
>            Assignee: Tommy Stendahl
>            Priority: Minor
>              Labels: lhf
>
> In one of our test clusters we have had some issues with OOM. While working 
> on fixing this it was discovered that one of the nodes that got OOM actually 
> wasn't shut down properly. Instead it went into a half-up-state where the 
> affected node considered itself up while all other nodes considered it as 
> down.
> The following stacktrace was observed which seems to be the cause of this:
> {noformat}
> java.lang.NoClassDefFoundError: Could not initialize class 
> java.lang.UNIXProcess
>         at java.lang.ProcessImpl.start(ProcessImpl.java:130) ~[na:1.8.0_131]
>         at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) 
> ~[na:1.8.0_131]
>         at java.lang.Runtime.exec(Runtime.java:620) ~[na:1.8.0_131]
>         at java.lang.Runtime.exec(Runtime.java:485) ~[na:1.8.0_131]
>         at 
> org.apache.cassandra.utils.HeapUtils.generateHeapDump(HeapUtils.java:88) 
> ~[apache-cassandra-2.2.10.jar:2.2.10]
>         at 
> org.apache.cassandra.utils.JVMStabilityInspector.inspectThrowable(JVMStabilityInspector.java:56)
>  ~[apache-cassandra-2.2.10.jar:2.2.10]
>         at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:168)
>  ~[apache-cassandra-2.2.10.jar:2.2.10]
>         at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  ~[apache-cassandra-2.2.10.jar:2.2.10]
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> ~[apache-cassandra-2.2.10.jar:2.2.10]
>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> {noformat}
> It seems that if an unexpected exception/error is thrown inside 
> JVMStabilityInspector.inspectThrowable the JVM is not actually shut down but 
> instead keeps on running. My expectation is that the JVM should shut down in 
> case OOM is thrown.
> Potential workaround is to add:
> {noformat}
> JVM_OPTS="$JVM_OPTS -XX:+ExitOnOutOfMemoryError"
> {noformat}
> to cassandra-env.sh.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13886) OOM put node in limbo

Reply via email to