Please post all updates here (has pictures and better 
formatting): 
https://groups.google.com/forum/?fromgroups=#!topic/elasticsearch/VxkosQuKzaA

Thanks Adrien, I've cross-posted your reply in the other post.

On Monday, October 20, 2014 3:57:56 PM UTC-4, Adrien Grand wrote:
>
> Hi Gavin,
>
> You might be hit by the following Guava bug: 
> https://github.com/elasticsearch/elasticsearch/issues/6268. It was fixed 
> in Elasticsearch 1.1.3/1.2.1/1.3.0
>
> On Mon, Oct 20, 2014 at 3:27 PM, Gavin Seng <[email protected] 
> <javascript:>> wrote:
>
>>                                                                           
>>                                                                             
>>                                                         
>> ### JRE 1.7.0_11 / ES 1.0.1 - GC not collecting old gen / Memory Leak?
>>                                                                          
>>                                                           
>> Hi,                                                                       
>>                                                                             
>>                                                          
>> We're seeing issues where GC collects less and less memory over time 
>> leading to the need to restart our nodes.                                   
>>                                                              
>> The following is our setup and what we've tried. Please tell me if 
>> anything is lacking and I'll be glad to provide more details.               
>>                                                              
>> Also appreciate any advice on how we can improve our configurations.
>>
>> Thank you for any help!
>>
>> Gavin
>>                                                                          
>>                                                                  
>> ### Cluster Setup                                                         
>>                                                                             
>>                                                          
>> * Tribes that link to 2 clusters                                         
>>                                                                             
>>                                                           
>> * Cluster 1                                                               
>>                                                                             
>>                                                          
>>   * 3 masters (vms, master=true, data=false)                             
>>                                                                             
>>                                                           
>>   * 2 hot nodes (physical, master=false, data=true)                       
>>                                                                             
>>                                                          
>>     * 2 hourly indices (1 for syslog, 1 for application logs)             
>>                                                                             
>>                                                          
>>     * 1 replica                                                           
>>                                                                             
>>                                                          
>>     * Each index ~ 2 million docs (6gb - excl. of replica)               
>>                                                                             
>>                                                           
>>     * Rolled to cold nodes after 48 hrs                                   
>>                                                                             
>>                                                          
>>   * 2 cold nodes (physical, master=false, data=true)                     
>>                                                                             
>>                                                           
>> * Cluster 2                                                               
>>                                                                             
>>                                                          
>>   * 3 masters (vms, master=true, data=false)                             
>>                                                                             
>>                                                           
>>   * 2 hot nodes (physical, master=false, data=true)                       
>>                                                                             
>>                                                          
>>     * 1 hourly index                                                     
>>                                                                             
>>                                                           
>>     * 1 replica                                                           
>>                                                                             
>>                                                          
>>     * Each index ~ 8 million docs (20gb - excl. of replica)               
>>                                                                             
>>                                                          
>>     * Rolled to cold nodes after 48 hrs                                   
>>                                                                             
>>                                                          
>>   * 2 cold nodes (physical, master=false, data=true)                     
>>                                                                             
>>                                                      
>> Interestingly, we're actually having problems on Cluster 1's hot nodes 
>> even though it indexes less.                                               
>>                                                           
>> It suggests that this is a problem with searching because Cluster 1 is 
>> searched on a lot more.                                                     
>>                                                                          
>>                                                                             
>>                                                                             
>>                                  
>> ### Machine settings (hot node)                                           
>>                                                                             
>>                                                          
>>   
>> * java                                                                   
>>                                                                             
>>                                                           
>>   * java version "1.7.0_11"
>>   * Java(TM) SE Runtime Environment (build 1.7.0_11-b21)                 
>>                                                                             
>>                                                           
>>   * Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)       
>>                                                                             
>>                                                           
>> * 128gb ram                                                               
>>                                                                             
>>                                                          
>> * 8 cores, 32 cpus                                                       
>>                                                                             
>>                                                           
>> * ssds (raid 0)                                                           
>>                                                                             
>>                                                                             
>>                                                                             
>>                                                                             
>>                               
>> ### JVM settings                                                         
>>                                                                             
>>                                                           
>>                                                                           
>>                                                                             
>>                                                    
>> ```                                                                       
>>                                                                             
>>                                                          
>> java                                                                     
>>                                                                             
>>                                                           
>> -Xms96g -Xmx96g -Xss256k                                                 
>>                                                                             
>>                                                           
>> -Djava.awt.headless=true                                                 
>>                                                                             
>>                                                           
>> -XX:+UseParNewGC -XX:+UseConcMarkSweepGC 
>> -XX:CMSInitiatingOccupancyFraction=75                                       
>>                                                                             
>>               
>> -XX:+UseCMSInitiatingOccupancyOnly                                       
>>                                                                             
>>                                                           
>> -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram 
>> -XX:+PrintTenuringDistribution                                             
>>                                                                 
>> -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/log/elasticsearch/gc.log 
>> -XX:+HeapDumpOnOutOfMemoryError                                             
>>                                                           
>> -verbose:gc -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation 
>> -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10M                             
>>                                                                       
>> -Xloggc:[...]                                                             
>>                                                                             
>>                                                          
>> -Dcom.sun.management.jmxremote 
>> -Dcom.sun.management.jmxremote.local.only=[...]                             
>>                                                                             
>>                         
>> -Dcom.sun.management.jmxremote.ssl=[...] 
>> -Dcom.sun.management.jmxremote.authenticate=[...]                           
>>                                                                             
>>               
>> -Dcom.sun.management.jmxremote.port=[...]                                 
>>                                                                             
>>                                                          
>> -Delasticsearch -Des.pidfile=[...]                                       
>>                                                                             
>>                                                           
>> -Des.path.home=/usr/share/elasticsearch -cp 
>> :/usr/share/elasticsearch/lib/elasticsearch-1.0.1.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/*
>>  
>>                                          
>> -Des.default.path.home=/usr/share/elasticsearch                           
>>                                                                             
>>                                                          
>> -Des.default.path.logs=[...]                                             
>>                                                                             
>>                                                           
>> -Des.default.path.data=[...]                                             
>>                                                                             
>>                                                           
>> -Des.default.path.work=[...]                                             
>>                                                                             
>>                                                           
>> -Des.default.path.conf=/etc/elasticsearch 
>> org.elasticsearch.bootstrap.Elasticsearch                                   
>>                                                                             
>>              
>> ```                                                                       
>>                                                                             
>>                                                                             
>>                                                                             
>>                                                                             
>>                             
>> ## Key elasticsearch.yml settings                                         
>>                                                                             
>>                                                          
>>   
>> * threadpool.bulk.type: fixed                                             
>>                                                                             
>>                                                          
>> * threadpool.bulk.queue_size: 1000                                       
>>                                                                             
>>                                                           
>> * indices.memory.index_buffer_size: 30%                                   
>>                                                                             
>>                                                          
>> * index.translog.flush_threshold_ops: 50000                               
>>                                                                             
>>                                                          
>> * indices.fielddata.cache.size: 30%
>>
>>                                                                           
>>                                                                             
>>                                                          
>> ### Search Load (Cluster 1)                                               
>>                                                                             
>>                                                          
>>  
>> * Mainly Kibana3 (queries ES with daily alias that expands to 24 hourly 
>> indices)                                                                   
>>                                                             
>> * Jenkins jobs that constantly run and do many faceting/aggregations for 
>> the last hour's of data                                                     
>>                                                           
>>                                                                           
>>                                                                             
>>                                                          
>> ### Things we've tried (unsuccesfully)                                   
>>                                                                             
>>                                                           
>>                                                                           
>>                                                                             
>>                                                          
>> * GC settings                                                             
>>                                                                             
>>                                                          
>>   * young/old ratio                                                       
>>                                                                             
>>                                                          
>>     * Set young/old ration to 50/50 hoping that things would get GCed 
>> before having the chance to move to old.                                   
>>                                                               
>>     * The old grew at a slower rate but still things could not be 
>> collected.                                                                 
>>                                                                   
>>   * survivor space ratio                                                 
>>                                                                             
>>                                                           
>>     * Give survivor space a higher ratio of young                         
>>                                                                             
>>                                                          
>>     * Increase number of generations to make it to old be 10 (up from 6) 
>>                                                                             
>>                                                           
>>   * Lower cms occupancy ratio                                             
>>                                                                             
>>                                                          
>>     * Tried 60% hoping to kick GC earlier. GC kicked in earlier but still 
>> could not collect.                                                         
>>                                                           
>> * Limit filter/field cache                                               
>>                                                                             
>>                                                           
>>   * indices.fielddata.cache.size: 32GB                                   
>>                                                                             
>>                                                           
>>   * indices.cache.filter.size: 4GB                                       
>>                                                                             
>>                                                           
>> * Optimizing index to 1 segment on the 3rd hour                           
>>                                                                             
>>                                                          
>> * Limit JVM to 32 gb ram                                                 
>>                                                                             
>>                                                           
>>   * reference: 
>> http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/_limiting_memory_usage.html
>>  
>>                                                                             
>>                       
>> * Limit JVM to 65 gb ram                                                 
>>                                                                             
>>                                                           
>>   * This fulfils the 'leave 50% to the os' principle.                     
>>                                                                             
>>                                                          
>> * Read 90.5/7 OOM errors-- memory leak or GC problems?                   
>>                                                                             
>>                                                           
>>   * 
>> https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/memory$20leak/elasticsearch/_Zve60xOh_E/N13tlXgkUAwJ
>>  
>>                                                                             
>>       
>>   * But we're not using term filters 
>>
>>
>> ### 32 GB Heap
>>
>>
>> <https://lh3.googleusercontent.com/-T6uUeqSFhns/VEUMsYWwukI/AAAAAAAABm0/eayuSevxWNY/s1600/es_32gb.png>
>>
>> ### 65 GB Heap
>>
>>
>> <https://lh4.googleusercontent.com/-C9ScRI9pO2A/VEUM6uxcJ-I/AAAAAAAABm8/iGqqKemt4aw/s1600/es_65gb.png>
>>
>> ### 65 GB Heap with changed young/old ratio
>>
>>
>> <https://lh4.googleusercontent.com/-Ugzr4PQv_uE/VEUNGy-zguI/AAAAAAAABnE/FbnhnVHQQ20/s1600/es_65gb_yo.png>
>>
>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/33c2b354-975d-4340-ad77-4d675e577339%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/33c2b354-975d-4340-ad77-4d675e577339%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Adrien Grand
>  

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/5f57c616-3f67-4157-a28b-dcf4fc0478b0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to