Hi Nik, Thanks, that was spot on! Clearing the field cache immediately caused a drop in cpu usage to normal. If I understand the documenation correctly the field cache size is unbounded per default, so I'll look into setting a limit. It seems you can't set the field cache size limit using the cluster settings API, so I guess I'll have to set it in the configuration file and restart the ES instances.
Thanks again! Magnus On Wednesday, February 26, 2014 5:46:52 PM UTC+1, Nikolas Everett wrote: > > Check to see how much GC you are doing when it spikes. If it is high, try > to clear the cache: > http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-clearcache.html > > I'd try clearing each cache one at a time to see which one helps. If that > is the problem you can configure Elasticsearch to limit the size of those > caches to some percent of heap. > > Nik > > > On Wed, Feb 26, 2014 at 11:41 AM, Magnus Hyllander > <[email protected]<javascript:> > > wrote: > >> I have an ES 0.90.11 cluster with three nodes (d0, d1, d2), with 4 cores >> and 7GB memory, running Ubuntu and JDK 7u45. The ES instances are all >> master+data, configured with 3.5GB heap size. They are pretty much running >> a vanilla configuration. Logstash is currently storing on average 200 logs >> per second to the cluster, and we use kibana as a frontend. Usually when >> teh cluster is started the nodes run at around 20% cpu. However after some >> time, one or more of the nodes will jump up to around 90-100% cpu. And >> there they stay for what appears to be forever (until I tire and restart >> them). >> >> Using "top -H" I can see that there is one thread in each elasticsearch >> process that is using most of the cpu. Here are examples from two of the >> nodes: >> >> Node d1: >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 41969 elastic 20 0 5814m 3.5g 11m R 82.8 52.0 1036:30 java >> 45601 elastic 20 0 5814m 3.5g 11m S 31.9 52.0 23:02.45 java >> 41965 elastic 20 0 5814m 3.5g 11m S 19.1 52.0 25:25.97 java >> 41966 elastic 20 0 5814m 3.5g 11m S 12.7 52.0 25:25.95 java >> 41967 elastic 20 0 5814m 3.5g 11m S 12.7 52.0 25:23.10 java >> 41968 elastic 20 0 5814m 3.5g 11m S 12.7 52.0 25:23.27 java >> 45810 elastic 20 0 5814m 3.5g 11m S 6.4 52.0 22:59.55 java >> >> Node d2: >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> 40604 elastic 20 0 5812m 3.6g 11m R 99.9 53.2 926:23.96 java >> 41487 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 4:35.11 java >> 42443 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 47:03.65 java >> 42446 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 47:05.12 java >> 42447 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 46:38.30 java >> 31827 elastic 20 0 5812m 3.6g 11m S 6.5 53.2 0:00.59 java >> >> As you can see there is one thread in each process that seems to be >> running amok. >> >> I have tried to use the _nodes/hot_threads API to see which thread is >> using the cpu, but I can't identify any single thread with the same cpu >> percentage that top reports. In addition, I have tried using jstack to dump >> the threads, but the stack dump doesn't even list the thread with the >> thread PID from top. >> >> Here are a couple of charts showing the cpu user percentage: >> >> >> <https://lh5.googleusercontent.com/-Clcdm5Zh5Ps/Uw4YZLI6BmI/AAAAAAAAEVE/eYINhJP3ACo/s1600/Image.png> >> >> >> As you can see all the nodes went from 20% to 100% at around 3 PM. At >> midnight I got tired of waiting and restarted ES, one node at a time. >> >> The next chart is from some hours later: >> >> >> <https://lh6.googleusercontent.com/-j5Fb3d-GxHU/Uw4YdvfNbaI/AAAAAAAAEVM/3c1g-ztRA18/s1600/Image.png> >> >> >> In this case the nodes' cpu usage increased at different points in time. >> >> Cpu iowait remains low (5-10%) the whole time. >> >> I'm thinking that maybe this behavior is triggered by large queries, but >> I don't have a specific test case that triggers it. >> >> So, what can I do to find out what is going on? Any help would be greatly >> appreciated! >> >> Regards, >> Magnus Hyllander >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/6af1c79d-8402-4de6-9ec2-07893c6b54f2%40googlegroups.com >> . >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/65c9dd2e-e441-4fe4-ada0-c517b55336f9%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
