RE: cassandra OOM

Durity, Sean R Tue, 25 Apr 2017 10:48:17 -0700

We have seen much better stability (and MUCH less GC pauses) from G1 with a 
variety of heap sizes. I don’t even consider CMS any more.

Sean Durity

From: Gopal, Dhruva [mailto:dhruva.go...@aspect.com]
Sent: Tuesday, April 04, 2017 5:34 PM
To: user@cassandra.apache.org
Subject: Re: cassandra OOM

Thanks, that’s interesting – so CMS is a better option for 
stability/performance? We’ll try this out in our cluster.

From: Alexander Dejanovski 
<a...@thelastpickle.com<mailto:a...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, April 3, 2017 at 10:31 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: cassandra OOM

Hi,

we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB heap 
when the workload is intense, and given you're running on m4.2xl I wouldn't go 
over 16GB for the heap.

I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new gen. 
You can use 5 as MaxTenuringThreshold as an initial value and activate GC 
logging to fine tune the settings afterwards.

FYI CMS tends to perform better than G1 even though it's a little bit harder to 
tune.

Cheers,

On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva 
<dhruva.go...@aspect.com<mailto:dhruva.go...@aspect.com>> wrote:
16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using 
m2.2xlarge instances in AWS):

#################
# HEAP SETTINGS #
#################

# Heap size is automatically calculated by cassandra-env based on this
# formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
# That is:
# - calculate 1/2 ram and cap to 1024MB
# - calculate 1/4 ram and cap to 8192MB
# - pick the max
#
# For production use you may wish to adjust this for your environment.
# If that's the case, uncomment the -Xmx and Xms options below to override the
# automatic calculation of JVM heap memory.
#
# It is recommended to set min (-Xms) and max (-Xmx) heap sizes to
# the same value to avoid stop-the-world GC pauses during resize, and
# so that we can lock the heap in memory on startup to prevent any
# of it from being swapped out.
-Xms16G
-Xmx16G

# Young generation size is automatically calculated by cassandra-env
# based on this formula: min(100 * num_cores, 1/4 * heap size)
#
# The main trade-off for the young generation is that the larger it
# is, the longer GC pause times will be. The shorter it is, the more
# expensive GC will be (usually).
#
# It is not recommended to set the young generation size if using the
# G1 GC, since that will override the target pause-time goal.
# More info: 
http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.oracle.com_technetwork_articles_java_g1gc-2D1984535.html&d=DwMGaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=sW03C2XjzKcalSLXhtI4w0y-hPFk4-Nmh4BIt46jHxk&s=xuMqARzoTSasEmAPkP7fVOcPZS050fy1N2_Ac5poOtA&e=>
#
# The example below assumes a modern 8-core+ machine for decent
# times. If in doubt, and if you do not particularly want to tweak, go
# 100 MB per physical CPU core.
#-Xmn800M

#################
#  GC SETTINGS  #
#################

### CMS Settings

#-XX:+UseParNewGC
#-XX:+UseConcMarkSweepGC
#-XX:+CMSParallelRemarkEnabled
#-XX:SurvivorRatio=8
#-XX:MaxTenuringThreshold=1
#-XX:CMSInitiatingOccupancyFraction=75
#-XX:+UseCMSInitiatingOccupancyOnly
#-XX:CMSWaitDuration=10000
#-XX:+CMSParallelInitialMarkEnabled
#-XX:+CMSEdenChunksRecordAlways
# some JVMs will fill up their heap when accessed via JMX, see CASSANDRA-6541
#-XX:+CMSClassUnloadingEnabled

### G1 Settings (experimental, comment previous section and uncomment section 
below to enable)

## Use the Hotspot garbage-first collector.
-XX:+UseG1GC
#
## Have the JVM do less remembered set work during STW, instead
## preferring concurrent GC. Reduces p99.9 latency.
-XX:G1RSetUpdatingPauseTimePercent=5
#
## Main G1GC tunable: lowering the pause target will lower throughput and vise 
versa.
## 200ms is the JVM default and lowest viable setting
## 1000ms increases throughput. Keep it smaller than the timeouts in 
cassandra.yaml.
-XX:MaxGCPauseMillis=500

## Optional G1 Settings

# Save CPU time on large (>= 16GB) heaps by delaying region scanning
# until the heap is 70% full. The default in Hotspot 8u40 is 40%.
-XX:InitiatingHeapOccupancyPercent=70

# For systems with > 8 cores, the default ParallelGCThreads is 5/8 the number 
of logical cores.
# Otherwise equal to the number of cores when 8 or less.
# Machines with > 10 cores should try setting these to <= full cores.
#-XX:ParallelGCThreads=16
# By default, ConcGCThreads is 1/4 of ParallelGCThreads.
# Setting both to the same value can reduce STW durations.
#-XX:ConcGCThreads=16

### GC logging options -- uncomment to enable

#-XX:+PrintGCDetails
#-XX:+PrintGCDateStamps
#-XX:+PrintHeapAtGC
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
#-XX:+PrintPromotionFailure
#-XX:PrintFLSStatistics=1
#-Xloggc:/var/log/cassandra/gc.log
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=10
#-XX:GCLogFileSize=10M

From: Alexander Dejanovski 
<a...@thelastpickle.com<mailto:a...@thelastpickle.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Monday, April 3, 2017 at 8:00 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: Re: cassandra OOM

Hi,

could you share your GC settings ? G1 or CMS ? Heap size, etc...

Thanks,

On Sun, Apr 2, 2017 at 10:30 PM Gopal, Dhruva 
<dhruva.go...@aspect.com<mailto:dhruva.go...@aspect.com>> wrote:
Hi –
  We’ve had what looks like an OOM situation with Cassandra (we have a dump 
file that got generated) in our staging (performance/load testing environment) 
and I wanted to reach out to this user group to see if you had any 
recommendations on how we should approach our investigation as to the cause of 
this issue. The logs don’t seem to point to any obvious issues, and we’re no 
experts in analyzing this by any means, so was looking for guidance on how to 
proceed. Should we enter a Jira as well? We’re on Cassandra 3.9, and are 
running  a six node cluster. This happened in a controlled load testing 
environment. Feedback will be much appreciated!

Regards,
Dhruva

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com_&d=DwMGaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=sW03C2XjzKcalSLXhtI4w0y-hPFk4-Nmh4BIt46jHxk&s=M6B1E7bxftbHbnXYgGIFw4gBuwVBsK13Ha6sZHWkpFE&e=>
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.
--
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.thelastpickle.com_&d=DwMGaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=sW03C2XjzKcalSLXhtI4w0y-hPFk4-Nmh4BIt46jHxk&s=M6B1E7bxftbHbnXYgGIFw4gBuwVBsK13Ha6sZHWkpFE&e=>
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

________________________________

The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

RE: cassandra OOM

Reply via email to