Memory Leak? (reposted with better formatting)

Adrien Grand Tue, 21 Oct 2014 01:01:56 -0700

Gavin,

Can you look at the stats APIs to see what they report regarding memory?
For instance the following call to the _cat API would return memory usage
for fielddata, filter cache, segments, the index writer and the version map:


  curl -XGET 'localhost:9200/_cat/nodes?v&h=v,j,hm,fm,fcm,sm,siwm,svmm'



On Tue, Oct 21, 2014 at 5:01 AM, Gavin Seng <[email protected]> wrote:

>
> Actually now that I read the bug a little more carefully, I'm not so
> optimistic.
>
> * The cache here (
> https://github.com/elasticsearch/elasticsearch/issues/6268) is the filter
> cache and mine was only set at 8 gb.
> * Maybe fielddata is a guava cache ... but I did set it to 30% for a run
> with 96gb heap - so the fielddata cache is 28.8gb (< 32 gb).
>
> Nonetheless, I'm trying a run now with an explicit 31gb of fielddata cache
> and will report back.
>
> ### 96 gb heap with 30% fielddata cache and 8gb filter cache
>
> http://i.imgur.com/FMp49ZZ.png
>
> <http://i.imgur.com/FMp49ZZ.png>
>
>
> On Monday, October 20, 2014 9:18:22 PM UTC-4, Gavin Seng wrote:
>>
>>
>> Thanks Adrien, my cache is exactly 32GB so I'm cautiously optimistic ...
>> will try it out and report back!
>>
>> From Adrien Grand:
>> You might be hit by the following Guava bug: https://github.com/
>> elasticsearch/elasticsearch/issues/6268. It was fixed in Elasticsearch
>> 1.1.3/1.2.1/1.3.0
>>
>>
>> On Monday, October 20, 2014 11:42:34 AM UTC-4, Gavin Seng wrote:
>>>
>>>
>>> ### JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?
>>>
>>> ** reposting because 1st one came out w/o images and all kinds of
>>> strange spaces.
>>>
>>> Hi,
>>>
>>> We're seeing issues where GC collects less and less memory over time
>>> leading to the need to restart our nodes.
>>>
>>> The following is our setup and what we've tried. Please tell me if
>>> anything is lacking and I'll be glad to provide more details.
>>>
>>> Also appreciate any advice on how we can improve our configurations.
>>>
>>> ### 32 GB heap
>>>
>>> http://i.imgur.com/JNpWeTw.png
>>> <http://i.imgur.com/Aa3fOMG.png>
>>>
>>>
>>> ### 65 GB heap
>>>
>>> http://i.imgur.com/qcLhC3M.png
>>> <http://i.imgur.com/qcLhC3M.png>
>>>
>>>
>>>
>>> ### 65 GB heap with changed young/old ratio
>>>
>>> http://i.imgur.com/Aa3fOMG.png
>>> <http://i.imgur.com/Aa3fOMG.png>
>>>
>>>
>>> ### Cluster Setup
>>>
>>> * Tribes that link to 2 clusters
>>> * Cluster 1
>>>   * 3 masters (vms, master=true, data=false)
>>>   * 2 hot nodes (physical, master=false, data=true)
>>>     * 2 hourly indices (1 for syslog, 1 for application logs)
>>>     * 1 replica
>>>     * Each index ~ 2 million docs (6gb - excl. of replica)
>>>     * Rolled to cold nodes after 48 hrs
>>>   * 2 cold nodes (physical, master=false, data=true)
>>> * Cluster 2
>>>   * 3 masters (vms, master=true, data=false)
>>>   * 2 hot nodes (physical, master=false, data=true)
>>>     * 1 hourly index
>>>     * 1 replica
>>>     * Each index ~ 8 million docs (20gb - excl. of replica)
>>>     * Rolled to cold nodes after 48 hrs
>>>   * 2 cold nodes (physical, master=false, data=true)
>>>
>>> Interestingly, we're actually having problems on Cluster 1's hot nodes
>>> even though it indexes less.
>>>
>>> It suggests that this is a problem with searching because Cluster 1 is
>>> searched on a lot more.
>>>
>>> ### Machine settings (hot node)
>>>
>>> * java
>>>   * java version "1.7.0_11"
>>>   * Java(TM) SE Runtime Environment (build 1.7.0_11-b21)
>>>   * Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
>>> * 128gb ram
>>> * 8 cores, 32 cpus
>>> * ssds (raid 0)
>>>
>>> ### JVM settings
>>>
>>> ```
>>> java
>>> -Xms96g -Xmx96g -Xss256k
>>> -Djava.awt.headless=true
>>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:
>>> CMSInitiatingOccupancyFraction=75
>>> -XX:+UseCMSInitiatingOccupancyOnly
>>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram
>>> -XX:+PrintTenuringDistribution
>>> -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/elasticsearch/gc.log
>>> -XX:+HeapDumpOnOutOfMemoryError
>>> -verbose:gc -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation
>>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M
>>> -Xloggc:[...]
>>> -Dcom.sun.management.jmxremote -Dcom.sun.management.
>>> jmxremote.local.only=[...]
>>> -Dcom.sun.management.jmxremote.ssl=[...] -Dcom.sun.management.
>>> jmxremote.authenticate=[...]
>>> -Dcom.sun.management.jmxremote.port=[...]
>>> -Delasticsearch -Des.pidfile=[...]
>>> -Des.path.home=/usr/share/elasticsearch -cp
>>> :/usr/share/elasticsearch/lib/elasticsearch-1.0.1.jar:/usr/
>>> share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
>>> -Des.default.path.home=/usr/share/elasticsearch
>>> -Des.default.path.logs=[...]
>>> -Des.default.path.data=[...]
>>> -Des.default.path.work=[...]
>>> -Des.default.path.conf=/etc/elasticsearch org.elasticsearch.bootstrap.
>>> Elasticsearch
>>> ```
>>>
>>> ## Key elasticsearch.yml settings
>>>
>>> * threadpool.bulk.type: fixed
>>> * threadpool.bulk.queue_size: 1000
>>> * indices.memory.index_buffer_size: 30%
>>> * index.translog.flush_threshold_ops: 50000
>>> * indices.fielddata.cache.size: 30%
>>>
>>>
>>> ### Search Load (Cluster 1)
>>>
>>> * Mainly Kibana3 (queries ES with daily alias that expands to 24 hourly
>>> indices)
>>> * Jenkins jobs that constantly run and do many faceting/aggregations for
>>> the last hour's of data
>>>
>>> ### Things we've tried (unsuccesfully)
>>>
>>> * GC settings
>>>   * young/old ratio
>>>     * Set young/old ration to 50/50 hoping that things would get GCed
>>> before having the chance to move to old.
>>>     * The old grew at a slower rate but still things could not be
>>> collected.
>>>   * survivor space ratio
>>>     * Give survivor space a higher ratio of young
>>>     * Increase number of generations to make it to old be 10 (up from 6)
>>>   * Lower cms occupancy ratio
>>>     * Tried 60% hoping to kick GC earlier. GC kicked in earlier but
>>> still could not collect.
>>> * Limit filter/field cache
>>>   * indices.fielddata.cache.size: 32GB
>>>   * indices.cache.filter.size: 4GB
>>> * Optimizing index to 1 segment on the 3rd hour
>>> * Limit JVM to 32 gb ram
>>>   * reference: http://www.elasticsearch.org/
>>> guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
>>> * Limit JVM to 65 gb ram
>>>   * This fulfils the 'leave 50% to the os' principle.
>>> * Read 90.5/7 OOM errors-- memory leak or GC problems?
>>>   * https://groups.google.com/forum/?fromgroups#!searchin/
>>> elasticsearch/memory$20leak/elasticsearch/_Zve60xOh_E/N13tlXgkUAwJ
>>>   * But we're not using term filters
>>>
>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/1034b72c-76b0-407a-9dfb-8b0f371f6026%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/1034b72c-76b0-407a-9dfb-8b0f371f6026%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adrien Grand

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6BXpSNsH4Es5ERO-k0r5AZW9joX_2_yZ3tZoj5D3AKew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak? (reposted with better formatting)

Reply via email to