Actually now that I read the bug a little more carefully, I'm not so optimistic.
* The cache here (https://github.com/elasticsearch/elasticsearch/issues/6268) is the filter cache and mine was only set at 8 gb. * Maybe fielddata is a guava cache ... but I did set it to 30% for a run with 96gb heap - so the fielddata cache is 28.8gb (< 32 gb). Nonetheless, I'm trying a run now with an explicit 31gb of fielddata cache and will report back. ### 96 gb heap with 30% fielddata cache and 8gb filter cache http://i.imgur.com/FMp49ZZ.png <http://i.imgur.com/FMp49ZZ.png> On Monday, October 20, 2014 9:18:22 PM UTC-4, Gavin Seng wrote: > > > Thanks Adrien, my cache is exactly 32GB so I'm cautiously optimistic ... > will try it out and report back! > > From Adrien Grand: > You might be hit by the following Guava bug: > https://github.com/elasticsearch/elasticsearch/issues/6268. It was fixed > in Elasticsearch 1.1.3/1.2.1/1.3.0 > > > On Monday, October 20, 2014 11:42:34 AM UTC-4, Gavin Seng wrote: >> >> >> ### JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak? >> >> ** reposting because 1st one came out w/o images and all kinds of strange >> spaces. >> >> Hi, >> >> We're seeing issues where GC collects less and less memory over time >> leading to the need to restart our nodes. >> >> The following is our setup and what we've tried. Please tell me if >> anything is lacking and I'll be glad to provide more details. >> >> Also appreciate any advice on how we can improve our configurations. >> >> ### 32 GB heap >> >> http://i.imgur.com/JNpWeTw.png >> <http://i.imgur.com/Aa3fOMG.png> >> >> >> ### 65 GB heap >> >> http://i.imgur.com/qcLhC3M.png >> <http://i.imgur.com/qcLhC3M.png> >> >> >> >> ### 65 GB heap with changed young/old ratio >> >> http://i.imgur.com/Aa3fOMG.png >> <http://i.imgur.com/Aa3fOMG.png> >> >> >> ### Cluster Setup >> >> * Tribes that link to 2 clusters >> * Cluster 1 >> * 3 masters (vms, master=true, data=false) >> * 2 hot nodes (physical, master=false, data=true) >> * 2 hourly indices (1 for syslog, 1 for application logs) >> * 1 replica >> * Each index ~ 2 million docs (6gb - excl. of replica) >> * Rolled to cold nodes after 48 hrs >> * 2 cold nodes (physical, master=false, data=true) >> * Cluster 2 >> * 3 masters (vms, master=true, data=false) >> * 2 hot nodes (physical, master=false, data=true) >> * 1 hourly index >> * 1 replica >> * Each index ~ 8 million docs (20gb - excl. of replica) >> * Rolled to cold nodes after 48 hrs >> * 2 cold nodes (physical, master=false, data=true) >> >> Interestingly, we're actually having problems on Cluster 1's hot nodes >> even though it indexes less. >> >> It suggests that this is a problem with searching because Cluster 1 is >> searched on a lot more. >> >> ### Machine settings (hot node) >> >> * java >> * java version "1.7.0_11" >> * Java(TM) SE Runtime Environment (build 1.7.0_11-b21) >> * Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode) >> * 128gb ram >> * 8 cores, 32 cpus >> * ssds (raid 0) >> >> ### JVM settings >> >> ``` >> java >> -Xms96g -Xmx96g -Xss256k >> -Djava.awt.headless=true >> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC >> -XX:CMSInitiatingOccupancyFraction=75 >> -XX:+UseCMSInitiatingOccupancyOnly >> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram >> -XX:+PrintTenuringDistribution >> -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/elasticsearch/gc.log >> -XX:+HeapDumpOnOutOfMemoryError >> -verbose:gc -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation >> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M >> -Xloggc:[...] >> -Dcom.sun.management.jmxremote >> -Dcom.sun.management.jmxremote.local.only=[...] >> -Dcom.sun.management.jmxremote.ssl=[...] >> -Dcom.sun.management.jmxremote.authenticate=[...] >> -Dcom.sun.management.jmxremote.port=[...] >> -Delasticsearch -Des.pidfile=[...] >> -Des.path.home=/usr/share/elasticsearch -cp >> :/usr/share/elasticsearch/lib/elasticsearch-1.0.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* >> -Des.default.path.home=/usr/share/elasticsearch >> -Des.default.path.logs=[...] >> -Des.default.path.data=[...] >> -Des.default.path.work=[...] >> -Des.default.path.conf=/etc/elasticsearch >> org.elasticsearch.bootstrap.Elasticsearch >> ``` >> >> ## Key elasticsearch.yml settings >> >> * threadpool.bulk.type: fixed >> * threadpool.bulk.queue_size: 1000 >> * indices.memory.index_buffer_size: 30% >> * index.translog.flush_threshold_ops: 50000 >> * indices.fielddata.cache.size: 30% >> >> >> ### Search Load (Cluster 1) >> >> * Mainly Kibana3 (queries ES with daily alias that expands to 24 hourly >> indices) >> * Jenkins jobs that constantly run and do many faceting/aggregations for >> the last hour's of data >> >> ### Things we've tried (unsuccesfully) >> >> * GC settings >> * young/old ratio >> * Set young/old ration to 50/50 hoping that things would get GCed >> before having the chance to move to old. >> * The old grew at a slower rate but still things could not be >> collected. >> * survivor space ratio >> * Give survivor space a higher ratio of young >> * Increase number of generations to make it to old be 10 (up from 6) >> * Lower cms occupancy ratio >> * Tried 60% hoping to kick GC earlier. GC kicked in earlier but still >> could not collect. >> * Limit filter/field cache >> * indices.fielddata.cache.size: 32GB >> * indices.cache.filter.size: 4GB >> * Optimizing index to 1 segment on the 3rd hour >> * Limit JVM to 32 gb ram >> * reference: >> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html >> * Limit JVM to 65 gb ram >> * This fulfils the 'leave 50% to the os' principle. >> * Read 90.5/7 OOM errors-- memory leak or GC problems? >> * >> https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/memory$20leak/elasticsearch/_Zve60xOh_E/N13tlXgkUAwJ >> * But we're not using term filters >> >> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1034b72c-76b0-407a-9dfb-8b0f371f6026%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
