Hi guys Hope one of you can help... In our prod environment, we have a 5 data nodes cluster (data:true, master:false) + 3 masters (master:true, data:false). Elasticsearch 1.4.4, Oracle Java 1.8. 40. Data nodes have 30GB memory, masters 15GB. We have a problem where the Heap crosses the heap limit in some nodes, and the whole cluster comes to a stop. This happens in maybe one or two nodes, while the other ones are still ok. No out of memory errors are displayed, but on the nodes that are still alive, you can see some errors like "No search context for id [xxxxx]". I need to restart the whole cluster for it to become responsive again. In the heap usage, i see that it behaves properly for a while, doing a nice saw pattern, but after a while (~1 day), some node starts going up and up without dropping anytime, then crossing the limit.
You can see some of this in this graph of one of our crashes: <https://lh4.googleusercontent.com/-_08MBFfBKbM/VQ_oaPJAh3I/AAAAAAACFdI/fOt75itKbDE/s1600/Screen%2BShot%2B2015-03-23%2Bat%2B10.12.28.png> Also, i can notice that the CPU usage gets to a peak when that raise starts. In elasticsearch.yml I don't have many important settings other than bootstrap.mlockall: true. In the enviroment variables file I have: ES_HEAP_SIZE=15342m MAX_OPEN_FILES=65535 MAX_LOCKED_MEMORY=unlimited MAX_MAP_COUNT=262144 Memory usage on the nodes seem to be fine, having around 6GB free all the time (even during the crashes). Field data seems to be around 300MB all the time, while filter cache is 1.5GB (10% of the Heap, as default). <https://lh3.googleusercontent.com/-rKO9_C32pvY/VQ_qUKfXwAI/AAAAAAACFdU/KLkdKHFftKU/s1600/Screen%2BShot%2B2015-03-23%2Bat%2B10.25.25.png> (In that graph you can see the filter size in 2 nodes going up at the end, that's when I increased it to 25% in 2 nodes, but same effect, cluster crashes the same way). I wonder if this is something related to https://github.com/elastic/elasticsearch/issues/8249, but seems to be fixed by 1.4.4. Any help will be greatly appreciated. Kind regards Jose -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c285ebfb-a6b8-40a3-b96f-e091bf8bdc4e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
