[ 
https://issues.apache.org/jira/browse/CASSANDRA-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jonathan lacefield updated CASSANDRA-8447:
------------------------------------------
    Attachment: memtable_debug

memtable information from system.log

> Nodes stuck in CMS GC cycle with very little traffic when compaction is 
> enabled
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8447
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8447
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cluster size - 4 nodes
> Node size - 12 CPU (hyper threaded to 24 cores), 192 GB RAM, 2 Raid 0 arrays 
> (Data - 10 disk, spinning 10k drives | CL 2 disk, spinning 10k drives)
> OS - RHEL 6.5
> jvm - oracle 1.7.0_71
> Cassandra version 2.0.11
>            Reporter: jonathan lacefield
>         Attachments: Node_with_compaction.png, Node_without_compaction.png, 
> cassandra.yaml, gc.logs.tar.gz, gcinspector_messages.txt, memtable_debug, 
> results.tar.gz, visualvm_screenshot
>
>
> Behavior - If autocompaction is enabled, nodes will become unresponsive due 
> to a full Old Gen heap which is not cleared during CMS GC.
> Test methodology - disabled autocompaction on 3 nodes, left autocompaction 
> enabled on 1 node.  Executed different Cassandra stress loads, using write 
> only operations.  Monitored visualvm and jconsole for heap pressure.  
> Captured iostat and dstat for most tests.  Captured heap dump from 50 thread 
> load.  Hints were disabled for testing on all nodes to alleviate GC noise due 
> to hints backing up.
> Data load test through Cassandra stress -  /usr/bin/cassandra-stress  write 
> n=1900000000 -rate threads=<different threads tested> -schema  
> replication\(factor=3\)  keyspace="Keyspace1" -node <all nodes listed>
> Data load thread count and results:
> * 1 thread - Still running but looks like the node can sustain this load 
> (approx 500 writes per second per node)
> * 5 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
> measured in the 60 second range (approx 2k writes per second per node)
> * 10 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
> measured in the 60 second range
> * 50 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
> measured in the 60 second range  (approx 10k writes per second per node)
> * 100 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
> measured in the 60 second range  (approx 20k writes per second per node)
> * 200 threads - Nodes become unresponsive due to full Old Gen Heap.  CMS 
> measured in the 60 second range  (approx 25k writes per second per node)
> Note - the observed behavior was the same for all tests except for the single 
> threaded test.  The single threaded test does not appear to show this 
> behavior.
> Tested different GC and Linux OS settings with a focus on the 50 and 200 
> thread loads.  
> JVM settings tested:
> #  default, out of the box, env-sh settings
> #  10 G Max | 1 G New - default env-sh settings
> #  10 G Max | 1 G New - default env-sh settings
> #* JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=50"
> #   20 G Max | 10 G New 
>    JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>    JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>    JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>    JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>    JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=8"
>    JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>    JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>    JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>    JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>    JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"
>    JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000"
>    JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=12"
>    JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=12"
>    JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>    JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>    JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>    JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768"
>    JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
> # 20 G Max | 1 G New 
>    JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>    JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>    JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>    JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>    JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=8"
>    JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>    JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>    JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
>    JVM_OPTS="$JVM_OPTS -XX:+CMSScavengeBeforeRemark"
>    JVM_OPTS="$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=60000"
>    JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=30000"
>    JVM_OPTS="$JVM_OPTS -XX:ParallelGCThreads=12"
>    JVM_OPTS="$JVM_OPTS -XX:ConcGCThreads=12"
>    JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions"
>    JVM_OPTS="$JVM_OPTS -XX:+UseGCTaskAffinity"
>    JVM_OPTS="$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs"
>    JVM_OPTS="$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768"
>    JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
> Linux OS settings tested:
> # Disabled Transparent Huge Pages
> echo never > /sys/kernel/mm/transparent_hugepage/enabled
> echo never > /sys/kernel/mm/transparent_hugepage/defrag
> # Enabled Huge Pages
> echo 21500000000 > /proc/sys/kernel/shmmax (over 20GB for heap)
> echo 1536 > /proc/sys/vm/nr_hugepages (20GB/2MB page size)
> # Disabled NUMA
> numa-off in /etc/grub.confdatastax
> # Verified all settings documented here were implemented
>   
> http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installRecommendSettings.html
> Attachments:
> #  .yaml
> #  fio output - results.tar.gz
> #  50 thread heap dump - 
> https://drive.google.com/a/datastax.com/file/d/0B4Imdpu2YrEbMGpCZW5ta2liQ2c/view?usp=sharing
> #  100 thread - visual vm anonymous screenshot - visualvm_screenshot
> #  dstat screen shot of with compaction - Node_with_compaction.png
> #  dstat screen shot of without compaction -- Node_without_compaction.png
> #  gcinspector messages from system.log
> # gc.log output - gc.logs.tar.gz
> Observations:
> #  even though this is a spinning disk implementation, disk io looks good. 
> #* output from Jshook perf monitor https://github.com/jshook/perfscripts is 
> attached
> #* note, we leveraged direct io for all tests by adding direct=1 to the 
> .global config files
> #  cpu usage is moderate until large GC events occur
> #  once old gen heap fills up and cannot clean, memtable post flushers start 
> to back up (show a lot pending) via tpstats
> #  the node itself, i.e. ssh, is still responsive but the Cassandra instance 
> becomes unresponsive
> # once old gen heap fills up Cassandra stress starts to throw CL ONE errors 
> stating there aren't enough replicas to satisfy....
> #  heap dump from 50 thread, JVM scenario 1 is attached
> #* appears to show a compaction thread consuming a lot of memory
> #  sample system.log output for gc issues
> #  strace -e futex -p $PID -f -c output during 100 thread load and during old 
> gen "filling", just before full
> % time    seconds  usecs/call    calls    errors syscall
> 100.00  244.886766        4992    49052      7507 futex
> 100.00  244.886766                49052      7507 total
> #  htop during full gc cycle  - 
> https://s3.amazonaws.com/uploads.hipchat.com/6528/480117/4ZlgcoNScb6kRM2/upload.png
> #  nothing is blocked via tpstats on these nodes
> #  compaction does have pending tasks, upwards of 20, on the nodes
> #  Nodes without compaction achieved approximately 20k writes per second per 
> node without errors or drops
> Next Steps:
> #  Will try to create a flame graph and update load here - 
> http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
> #  Will try to recreate in another environment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to