[ 
https://issues.apache.org/jira/browse/CASSANDRA-13006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195149#comment-16195149
 ] 

Benjamin Lerer commented on CASSANDRA-13006:
--------------------------------------------


[~urandom], [~brandon.williams], [~tjake]

Sorry, for the delay.

I pushed some patches for 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...blerer:13006-2.2],
 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...blerer:13006-3.0],
  
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...blerer:13006-3.11]
 and 
[trunk|https://github.com/apache/cassandra/compare/trunk...blerer:13006-trunk].

The branches differs only at the level of the configuration files 
({{cassandra-env.sh}} and {{cassandra-env.ps1}}). 

The patches let the JVM handle the {{OutOfMemoryErrors}} throught the JVM 
{{OnOutOfMemoryError}}, {{ExitOnOutOfMemoryError}} or 
{{CrashOnOutOfMemoryError}} options.
As the {{ExitOnOutOfMemoryError}} and {{CrashOnOutOfMemoryError}} options are 
only supported since Oracle JDK 7 update 101 and since JDK 8 update 92, 
Cassandra uses by default the {{OnOutOfMemoryError}} option.

A startup check will emit a warning if none of the options is used. This check 
is there to ensure that {{OOM}} errors are properly handled and that C* cannot 
continue to run in an unstable state that could cause data corruption.
The patch add no check for the {{HeapDumpOnOutOfMemoryError}} option as in some 
cases administrators prefer to disable them.

The {{cassandra-env.sh}} has a new variable {{JVM_ON_OUT_OF_MEMORY_ERROR_OPT}} 
which should be use to specify the {{OnOutOfMemoryError}} option. As bash 
commands split words on white spaces without taking quotes into account, 
specifying the {{OnOutOfMemoryError}} as part of the {{JVM_OPTS}} variable 
cannot work for an option value such as: {{kill -9 %p}}.

Before generating an heap dump, C* use to log an Heap histogram using {{jmap}}. 
If the heap size was large, reading the heap dump could take a few hours and 
the heap histogram can help to debug the problem much faster. The patches keep 
the posibility to print an heap histogram on OOM error but disables it by 
default. To enable it the {{cassandra.printHeapHistogramOnOutOfMemoryError}} 
system property must be set to {{true}}.
As generating the histogram for only the live objects (using {{jmap 
histo:live}}) would trigger a garbage collection before generating the 
histogram, I prefered to stick with {{jmap histo}} to minimize the risks.  

The previous implementation was suffering of 2 problems:
* If several OOM errors were thrown in a short time span, each of them would 
trigger an heap histogram and an heap dump (see this 
[comment|https://issues.apache.org/jira/browse/CASSANDRA-13006?focusedCommentId=16118421&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16118421])
* If an exception was thrown while C* was trying to generate the heap dump, C* 
would not be shutdown and would continue running in an unstable state (see 
CASSANDRA-13886)

The patches fix those problems for the case were an heap histogram need to be 
logged. In the case were the histogram is not requested those problems do not 
exist anymore.

CI looks good for the unit tests. 
The changes to the {{cassandra}} startup script break the DTests but before 
changing the {{DTests}} framework I would prefer having a first review of the 
patches.

[~JoshuaMcKenzie] could you review the patches? 

> Disable automatic heap dumps on OOM error
> -----------------------------------------
>
>                 Key: CASSANDRA-13006
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13006
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Configuration
>            Reporter: anmols
>            Assignee: Benjamin Lerer
>            Priority: Minor
>             Fix For: 3.0.15
>
>         Attachments: 13006-3.0.9.txt
>
>
> With CASSANDRA-9861, a change was added to enable collecting heap dumps by 
> default if the process encountered an OOM error. These heap dumps are stored 
> in the Apache Cassandra home directory unless configured otherwise (see 
> [Cassandra Support 
> Document|https://support.datastax.com/hc/en-us/articles/204225959-Generating-and-Analyzing-Heap-Dumps]
>  for this feature).
>  
> The creation and storage of heap dumps aides debugging and investigative 
> workflows, but is not be desirable for a production environment where these 
> heap dumps may occupy a large amount of disk space and require manual 
> intervention for cleanups. 
>  
> Managing heap dumps on out of memory errors and configuring the paths for 
> these heap dumps are available as JVM options in JVM. The current behavior 
> conflicts with the Boolean JVM flag HeapDumpOnOutOfMemoryError. 
>  
> A patch can be proposed here that would make the heap dump on OOM error honor 
> the HeapDumpOnOutOfMemoryError flag. Users who would want to still generate 
> heap dumps on OOM errors can set the -XX:+HeapDumpOnOutOfMemoryError JVM 
> option.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to