Re: cassandra OOM

Alexander Dejanovski Mon, 03 Apr 2017 22:32:39 -0700

Hi,

we've seen G1GC going OOM on production clusters (repeatedly) with a 16GB
heap when the workload is intense, and given you're running on m4.2xl I
wouldn't go over 16GB for the heap.


I'd suggest to revert back to CMS, using a 16GB heap and up to 6GB of new
gen. You can use 5 as MaxTenuringThreshold as an initial value and activate
GC logging to fine tune the settings afterwards.

FYI CMS tends to perform better than G1 even though it's a little bit
harder to tune.

Cheers,

On Mon, Apr 3, 2017 at 10:54 PM Gopal, Dhruva <dhruva.go...@aspect.com>
wrote:

> 16 Gig heap, with G1. Pertinent info from jvm.options below (we’re using
> m2.2xlarge instances in AWS):
>
>
>
>
>
> #################
>
> # HEAP SETTINGS #
>
> #################
>
>
>
> # Heap size is automatically calculated by cassandra-env based on this
>
> # formula: max(min(1/2 ram, 1024MB), min(1/4 ram, 8GB))
>
> # That is:
>
> # - calculate 1/2 ram and cap to 1024MB
>
> # - calculate 1/4 ram and cap to 8192MB
>
> # - pick the max
>
> #
>
> # For production use you may wish to adjust this for your environment.
>
> # If that's the case, uncomment the -Xmx and Xms options below to override
> the
>
> # automatic calculation of JVM heap memory.
>
> #
>
> # It is recommended to set min (-Xms) and max (-Xmx) heap sizes to
>
> # the same value to avoid stop-the-world GC pauses during resize, and
>
> # so that we can lock the heap in memory on startup to prevent any
>
> # of it from being swapped out.
>
> -Xms16G
>
> -Xmx16G
>
>
>
> # Young generation size is automatically calculated by cassandra-env
>
> # based on this formula: min(100 * num_cores, 1/4 * heap size)
>
> #
>
> # The main trade-off for the young generation is that the larger it
>
> # is, the longer GC pause times will be. The shorter it is, the more
>
> # expensive GC will be (usually).
>
> #
>
> # It is not recommended to set the young generation size if using the
>
> # G1 GC, since that will override the target pause-time goal.
>
> # More info:
> http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html
>
> #
>
> # The example below assumes a modern 8-core+ machine for decent
>
> # times. If in doubt, and if you do not particularly want to tweak, go
>
> # 100 MB per physical CPU core.
>
> #-Xmn800M
>
>
>
> #################
>
> #  GC SETTINGS  #
>
> #################
>
>
>
> ### CMS Settings
>
>
>
> #-XX:+UseParNewGC
>
> #-XX:+UseConcMarkSweepGC
>
> #-XX:+CMSParallelRemarkEnabled
>
> #-XX:SurvivorRatio=8
>
> #-XX:MaxTenuringThreshold=1
>
> #-XX:CMSInitiatingOccupancyFraction=75
>
> #-XX:+UseCMSInitiatingOccupancyOnly
>
> #-XX:CMSWaitDuration=10000
>
> #-XX:+CMSParallelInitialMarkEnabled
>
> #-XX:+CMSEdenChunksRecordAlways
>
> # some JVMs will fill up their heap when accessed via JMX, see
> CASSANDRA-6541
>
> #-XX:+CMSClassUnloadingEnabled
>
>
>
> ### G1 Settings (experimental, comment previous section and uncomment
> section below to enable)
>
>
>
> ## Use the Hotspot garbage-first collector.
>
> -XX:+UseG1GC
>
> #
>
> ## Have the JVM do less remembered set work during STW, instead
>
> ## preferring concurrent GC. Reduces p99.9 latency.
>
> -XX:G1RSetUpdatingPauseTimePercent=5
>
> #
>
> ## Main G1GC tunable: lowering the pause target will lower throughput and
> vise versa.
>
> ## 200ms is the JVM default and lowest viable setting
>
> ## 1000ms increases throughput. Keep it smaller than the timeouts in
> cassandra.yaml.
>
> -XX:MaxGCPauseMillis=500
>
>
>
> ## Optional G1 Settings
>
>
>
> # Save CPU time on large (>= 16GB) heaps by delaying region scanning
>
> # until the heap is 70% full. The default in Hotspot 8u40 is 40%.
>
> -XX:InitiatingHeapOccupancyPercent=70
>
>
>
> # For systems with > 8 cores, the default ParallelGCThreads is 5/8 the
> number of logical cores.
>
> # Otherwise equal to the number of cores when 8 or less.
>
> # Machines with > 10 cores should try setting these to <= full cores.
>
> #-XX:ParallelGCThreads=16
>
> # By default, ConcGCThreads is 1/4 of ParallelGCThreads.
>
> # Setting both to the same value can reduce STW durations.
>
> #-XX:ConcGCThreads=16
>
>
>
> ### GC logging options -- uncomment to enable
>
>
>
> #-XX:+PrintGCDetails
>
> #-XX:+PrintGCDateStamps
>
> #-XX:+PrintHeapAtGC
>
> #-XX:+PrintTenuringDistribution
>
> #-XX:+PrintGCApplicationStoppedTime
>
> #-XX:+PrintPromotionFailure
>
> #-XX:PrintFLSStatistics=1
>
> #-Xloggc:/var/log/cassandra/gc.log
>
> #-XX:+UseGCLogFileRotation
>
> #-XX:NumberOfGCLogFiles=10
>
> #-XX:GCLogFileSize=10M
>
>
>
>
>
> *From: *Alexander Dejanovski <a...@thelastpickle.com>
> *Reply-To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Date: *Monday, April 3, 2017 at 8:00 AM
> *To: *"user@cassandra.apache.org" <user@cassandra.apache.org>
> *Subject: *Re: cassandra OOM
>
>
>
> Hi,
>
>
>
> could you share your GC settings ? G1 or CMS ? Heap size, etc...
>
>
>
> Thanks,
>
>
>
> On Sun, Apr 2, 2017 at 10:30 PM Gopal, Dhruva <dhruva.go...@aspect.com>
> wrote:
>
> Hi –
>
>   We’ve had what looks like an OOM situation with Cassandra (we have a
> dump file that got generated) in our staging (performance/load testing
> environment) and I wanted to reach out to this user group to see if you had
> any recommendations on how we should approach our investigation as to the
> cause of this issue. The logs don’t seem to point to any obvious issues,
> and we’re no experts in analyzing this by any means, so was looking for
> guidance on how to proceed. Should we enter a Jira as well? We’re on
> Cassandra 3.9, and are running  a six node cluster. This happened in a
> controlled load testing environment. Feedback will be much appreciated!
>
>
>
>
>
> Regards,
>
> Dhruva
>
>
>
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>
> --
>
> -----------------
>
> Alexander Dejanovski
>
> France
>
> @alexanderdeja
>
>
>
> Consultant
>
> Apache Cassandra Consulting
>
> http://www.thelastpickle.com
> This email (including any attachments) is proprietary to Aspect Software,
> Inc. and may contain information that is confidential. If you have received
> this message in error, please do not read, copy or forward this message.
> Please notify the sender immediately, delete it from your system and
> destroy any copies. You may not further disclose or distribute this email
> or its attachments.
>
-- 
-----------------
Alexander Dejanovski
France
@alexanderdeja

Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: cassandra OOM

Reply via email to