Check to see how much GC you are doing when it spikes. If it is high, try to clear the cache: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-clearcache.html I'd try clearing each cache one at a time to see which one helps. If that is the problem you can configure Elasticsearch to limit the size of those caches to some percent of heap.
Nik On Wed, Feb 26, 2014 at 11:41 AM, Magnus Hyllander < [email protected]> wrote: > I have an ES 0.90.11 cluster with three nodes (d0, d1, d2), with 4 cores > and 7GB memory, running Ubuntu and JDK 7u45. The ES instances are all > master+data, configured with 3.5GB heap size. They are pretty much running > a vanilla configuration. Logstash is currently storing on average 200 logs > per second to the cluster, and we use kibana as a frontend. Usually when > teh cluster is started the nodes run at around 20% cpu. However after some > time, one or more of the nodes will jump up to around 90-100% cpu. And > there they stay for what appears to be forever (until I tire and restart > them). > > Using "top -H" I can see that there is one thread in each elasticsearch > process that is using most of the cpu. Here are examples from two of the > nodes: > > Node d1: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 41969 elastic 20 0 5814m 3.5g 11m R 82.8 52.0 1036:30 java > 45601 elastic 20 0 5814m 3.5g 11m S 31.9 52.0 23:02.45 java > 41965 elastic 20 0 5814m 3.5g 11m S 19.1 52.0 25:25.97 java > 41966 elastic 20 0 5814m 3.5g 11m S 12.7 52.0 25:25.95 java > 41967 elastic 20 0 5814m 3.5g 11m S 12.7 52.0 25:23.10 java > 41968 elastic 20 0 5814m 3.5g 11m S 12.7 52.0 25:23.27 java > 45810 elastic 20 0 5814m 3.5g 11m S 6.4 52.0 22:59.55 java > > Node d2: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 40604 elastic 20 0 5812m 3.6g 11m R 99.9 53.2 926:23.96 java > 41487 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 4:35.11 java > 42443 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 47:03.65 java > 42446 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 47:05.12 java > 42447 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 46:38.30 java > 31827 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 0:00.59 java > > As you can see there is one thread in each process that seems to be > running amok. > > I have tried to use the _nodes/hot_threads API to see which thread is > using the cpu, but I can't identify any single thread with the same cpu > percentage that top reports. In addition, I have tried using jstack to dump > the threads, but the stack dump doesn't even list the thread with the > thread PID from top. > > Here are a couple of charts showing the cpu user percentage: > > > <https://lh5.googleusercontent.com/-Clcdm5Zh5Ps/Uw4YZLI6BmI/AAAAAAAAEVE/eYINhJP3ACo/s1600/Image.png> > > > As you can see all the nodes went from 20% to 100% at around 3 PM. At > midnight I got tired of waiting and restarted ES, one node at a time. > > The next chart is from some hours later: > > > <https://lh6.googleusercontent.com/-j5Fb3d-GxHU/Uw4YdvfNbaI/AAAAAAAAEVM/3c1g-ztRA18/s1600/Image.png> > > > In this case the nodes' cpu usage increased at different points in time. > > Cpu iowait remains low (5-10%) the whole time. > > I'm thinking that maybe this behavior is triggered by large queries, but I > don't have a specific test case that triggers it. > > So, what can I do to find out what is going on? Any help would be greatly > appreciated! > > Regards, > Magnus Hyllander > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/6af1c79d-8402-4de6-9ec2-07893c6b54f2%40googlegroups.com > . > For more options, visit https://groups.google.com/groups/opt_out. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd2w7MEmj7HhC8uBbrvqgKyESXvMeSg7yzMcVTeRZXAi7Q%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
