Memory Leak?

Adrien Grand Mon, 20 Oct 2014 12:58:47 -0700

Hi Gavin,

You might be hit by the following Guava bug:
https://github.com/elasticsearch/elasticsearch/issues/6268. It was fixed in
Elasticsearch 1.1.3/1.2.1/1.3.0


On Mon, Oct 20, 2014 at 3:27 PM, Gavin Seng <[email protected]> wrote:

>
>
>
> ### JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?
>
>
> Hi,
>
>
> We're seeing issues where GC collects less and less memory over time
> leading to the need to restart our nodes.
>
> The following is our setup and what we've tried. Please tell me if
> anything is lacking and I'll be glad to provide more details.
>
> Also appreciate any advice on how we can improve our configurations.
>
> Thank you for any help!
>
> Gavin
>
>
> ### Cluster Setup
>
>
> * Tribes that link to 2 clusters
>
>
> * Cluster 1
>
>
>   * 3 masters (vms, master=true, data=false)
>
>
>   * 2 hot nodes (physical, master=false, data=true)
>
>
>     * 2 hourly indices (1 for syslog, 1 for application logs)
>
>
>     * 1 replica
>
>
>     * Each index ~ 2 million docs (6gb - excl. of replica)
>
>
>     * Rolled to cold nodes after 48 hrs
>
>
>   * 2 cold nodes (physical, master=false, data=true)
>
>
> * Cluster 2
>
>
>   * 3 masters (vms, master=true, data=false)
>
>
>   * 2 hot nodes (physical, master=false, data=true)
>
>
>     * 1 hourly index
>
>
>     * 1 replica
>
>
>     * Each index ~ 8 million docs (20gb - excl. of replica)
>
>
>     * Rolled to cold nodes after 48 hrs
>
>
>   * 2 cold nodes (physical, master=false, data=true)
>
>
> Interestingly, we're actually having problems on Cluster 1's hot nodes
> even though it indexes less.
>
> It suggests that this is a problem with searching because Cluster 1 is
> searched on a lot more.
>
>
>
>
> ### Machine settings (hot node)
>
>
>
> * java
>
>
>   * java version "1.7.0_11"
>   * Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
>
>
>   * Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
>
>
> * 128gb ram
>
>
> * 8 cores, 32 cpus
>
>
> * ssds (raid 0)
>
>
>
>
>
> ### JVM settings
>
>
>
>
>
> ```
>
>
> java
>
>
> -Xms96g -Xmx96g -Xss256k
>
>
> -Djava.awt.headless=true
>
>
> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
> -XX:CMSInitiatingOccupancyFraction=75
>
>
> -XX:+UseCMSInitiatingOccupancyOnly
>
>
> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram
> -XX:+PrintTenuringDistribution
>
> -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/elasticsearch/gc.log
> -XX:+HeapDumpOnOutOfMemoryError
>
> -verbose:gc -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
>
> -Xloggc:[...]
>
>
> -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.local.only=[...]
>
>
> -Dcom.sun.management.jmxremote.ssl=[...]
> -Dcom.sun.management.jmxremote.authenticate=[...]
>
>
> -Dcom.sun.management.jmxremote.port=[...]
>
>
> -Delasticsearch -Des.pidfile=[...]
>
>
> -Des.path.home=/usr/share/elasticsearch -cp
> :/usr/share/elasticsearch/lib/elasticsearch-1.0.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
>
> -Des.default.path.home=/usr/share/elasticsearch
>
>
> -Des.default.path.logs=[...]
>
>
> -Des.default.path.data=[...]
>
>
> -Des.default.path.work=[...]
>
>
> -Des.default.path.conf=/etc/elasticsearch
> org.elasticsearch.bootstrap.Elasticsearch
>
>
> ```
>
>
>
>
>
> ## Key elasticsearch.yml settings
>
>
>
> * threadpool.bulk.type: fixed
>
>
> * threadpool.bulk.queue_size: 1000
>
>
> * indices.memory.index_buffer_size: 30%
>
>
> * index.translog.flush_threshold_ops: 50000
>
>
> * indices.fielddata.cache.size: 30%
>
>
>
>
> ### Search Load (Cluster 1)
>
>
>
> * Mainly Kibana3 (queries ES with daily alias that expands to 24 hourly
> indices)
>
> * Jenkins jobs that constantly run and do many faceting/aggregations for
> the last hour's of data
>
>
>
>
> ### Things we've tried (unsuccesfully)
>
>
>
>
>
> * GC settings
>
>
>   * young/old ratio
>
>
>     * Set young/old ration to 50/50 hoping that things would get GCed
> before having the chance to move to old.
>
>     * The old grew at a slower rate but still things could not be
> collected.
>
>   * survivor space ratio
>
>
>     * Give survivor space a higher ratio of young
>
>
>     * Increase number of generations to make it to old be 10 (up from 6)
>
>
>   * Lower cms occupancy ratio
>
>
>     * Tried 60% hoping to kick GC earlier. GC kicked in earlier but still
> could not collect.
>
> * Limit filter/field cache
>
>
>   * indices.fielddata.cache.size: 32GB
>
>
>   * indices.cache.filter.size: 4GB
>
>
> * Optimizing index to 1 segment on the 3rd hour
>
>
> * Limit JVM to 32 gb ram
>
>
>   * reference:
> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
>
>
> * Limit JVM to 65 gb ram
>
>
>   * This fulfils the 'leave 50% to the os' principle.
>
>
> * Read 90.5/7 OOM errors-- memory leak or GC problems?
>
>
>   *
> https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/memory$20leak/elasticsearch/_Zve60xOh_E/N13tlXgkUAwJ
>
>
>   * But we're not using term filters
>
>
> ### 32 GB Heap
>
>
> <https://lh3.googleusercontent.com/-T6uUeqSFhns/VEUMsYWwukI/AAAAAAAABm0/eayuSevxWNY/s1600/es_32gb.png>
>
> ### 65 GB Heap
>
>
> <https://lh4.googleusercontent.com/-C9ScRI9pO2A/VEUM6uxcJ-I/AAAAAAAABm8/iGqqKemt4aw/s1600/es_65gb.png>
>
> ### 65 GB Heap with changed young/old ratio
>
>
> <https://lh4.googleusercontent.com/-Ugzr4PQv_uE/VEUNGy-zguI/AAAAAAAABnE/FbnhnVHQQ20/s1600/es_65gb_yo.png>
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/33c2b354-975d-4340-ad77-4d675e577339%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/33c2b354-975d-4340-ad77-4d675e577339%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6MOySTUytZLN4%2BQ6iWuysXpb7BmzZMxGjzqBqEyWR%2BpQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?

Reply via email to