It sounds like you are running into GC problems, which is inevitable when your cluster is at capacity. A few things;
You're running java with a >32GB heap, which will mean your pointers are no longer compressed and this can/will adversely impact GC. What ES version are you on, what java version and release, what are your node specs, how many indexes and how large are they? Make sure you're monitoring your cluster using plugins like ElasticHQ or Marvel to give you insight into what is happening. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: [email protected] web: www.campaignmonitor.com On 23 June 2014 04:44, Klavs Klavsen <[email protected]> wrote: > Hi guys, > > I've got an ES cluster of two data nodes and one no-data node (serving the > kibana website). It receives approx. 40 mio. loglines a day, and normally > has no issue with this. > If I stop reading in for a short time - and start again -the queue is > emptied about 50x faster than it is filled. > > We've had several different issues, and have fixed up nprocs and tuned > elasticsearch.yml - which have helped, but ES (since 1.1.2 - which might > be a coincidence though) suddenly gets an immense slowdown - which makes > the queue fill up. If I then stop everything and restart ES, then LS - it > usually picks back up. Sometimes I have to do it several times. > > The only thing that seems to increase in elasticsearch logs, around when > this happens is this message: > [2014-06-22 20:23:02,612][WARN ][transport ] > [p-elasticlog02] Received response for a request that has timed out, sent > [44943ms] ago, timed out [14943ms] ago, action > [discovery/zen/fd/masterPing], node > [[p-elasticlog03][JlyflI1AT6WJHh5fsk311w][p-elasticlog03.example.dk > ][inet[/10.223.156.18:9300]]{master=true}], id [23927] > > in the second node in the cluster (which seemed to be the cause) > there was GC messages.. and I had to bring down the entire cluster to make > it start running properly again ( I could perhaps just have restarted the > node writing about gc). > > I've set nprocs to 4096 and max open files to 65k. > > ES is started with: /usr/bin/java -Xms41886M -Xmx41886M > -XX:MaxDirectMemorySize=41886M -Xss256k -Djava.awt.headless=true > -XX:+UseParNewGC -XX:+UseConcMarkSweepGC > -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly > -XX:+HeapDumpOnOutOfMemoryError > -XX:HeapDumpPath=/var/lib/elasticsearch/heapdump.hprof -Delasticsearch > -Des.pidfile=/var/run/elasticsearch/elasticsearch.pid > -Des.path.home=/usr/share/elasticsearch -cp > :/usr/share/elasticsearch/lib/elasticsearch-1.1.2.jar:/usr/share/elasticsearch/lib/*:/usr/share/elasticsearch/lib/sigar/* > -Des.default.path.home=/usr/share/elasticsearch > -Des.default.path.logs=/var/log/elasticsearch > -Des.default.path.data=/var/lib/elasticsearch > -Des.default.path.work=/tmp/elasticsearch > -Des.default.path.conf=/etc/elasticsearch > org.elasticsearch.bootstrap.Elasticsearch > > > Any recommendations as to how I can make try to fix this problem? It > happens a few times a week :( > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/70c87756-f9b8-4032-9906-9a520c28801e%40googlegroups.com > <https://groups.google.com/d/msgid/elasticsearch/70c87756-f9b8-4032-9906-9a520c28801e%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZrtV26FMysaq3iYfJGLgoNscMW04dwsq4%3Dvy9TU1sFwg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
