Hi Nik,

Thanks, that was spot on! Clearing the field cache immediately caused a 
drop in cpu usage to normal. If I understand the documenation correctly the 
field cache size is unbounded per default, so I'll look into setting a 
limit. It seems you can't set the field cache size limit using the cluster 
settings API, so I guess I'll have to set it in the configuration file and 
restart the ES instances.

Thanks again!

Magnus

On Wednesday, February 26, 2014 5:46:52 PM UTC+1, Nikolas Everett wrote:
>
> Check to see how much GC you are doing when it spikes.  If it is high, try 
> to clear the cache: 
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-clearcache.html
>   
> I'd try clearing each cache one at a time to see which one helps.  If that 
> is the problem you can configure Elasticsearch to limit the size of those 
> caches to some percent of heap.
>
> Nik
>
>
> On Wed, Feb 26, 2014 at 11:41 AM, Magnus Hyllander 
> <[email protected]<javascript:>
> > wrote:
>
>>  I have an ES 0.90.11 cluster with three nodes (d0, d1, d2), with 4 cores 
>> and 7GB memory, running Ubuntu and JDK 7u45. The ES instances are all 
>> master+data, configured with 3.5GB heap size. They are pretty much running 
>> a vanilla configuration. Logstash is currently storing on average 200 logs 
>> per second to the cluster, and we use kibana as a frontend. Usually when 
>> teh cluster is started the nodes run at around 20% cpu. However after some 
>> time, one or more of the nodes will jump up to around 90-100% cpu. And 
>> there they stay for what appears to be forever (until I tire and restart 
>> them).
>>  
>> Using "top -H" I can see that there is one thread in each elasticsearch 
>> process that is using most of the cpu. Here are examples from two of the 
>> nodes:
>>
>> Node d1:
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
>> 41969 elastic   20   0 5814m 3.5g  11m R  82.8 52.0   1036:30 java
>>  45601 elastic   20   0 5814m 3.5g  11m S  31.9 52.0  23:02.45 java
>> 41965 elastic   20   0 5814m 3.5g  11m S  19.1 52.0  25:25.97 java
>> 41966 elastic   20   0 5814m 3.5g  11m S  12.7 52.0  25:25.95 java
>> 41967 elastic   20   0 5814m 3.5g  11m S  12.7 52.0  25:23.10 java
>> 41968 elastic   20   0 5814m 3.5g  11m S  12.7 52.0  25:23.27 java
>> 45810 elastic   20   0 5814m 3.5g  11m S   6.4 52.0  22:59.55 java
>>
>> Node d2:
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
>> 40604 elastic   20   0 5812m 3.6g  11m R  99.9 53.2 926:23.96 java
>> 41487 elastic   20   0 5812m 3.6g  11m S   6.5 53.2   4:35.11 java
>> 42443 elastic   20   0 5812m 3.6g  11m S   6.5 53.2  47:03.65 java
>> 42446 elastic   20   0 5812m 3.6g  11m S   6.5 53.2  47:05.12 java
>> 42447 elastic   20   0 5812m 3.6g  11m S   6.5 53.2  46:38.30 java
>> 31827 elastic   20   0 5812m 3.6g  11m S   6.5 53.2   0:00.59 java
>>  
>> As you can see there is one thread in each process that seems to be 
>> running amok. 
>>
>> I have tried to use the _nodes/hot_threads API to see which thread is 
>> using the cpu, but I can't identify any single thread with the same cpu 
>> percentage that top reports. In addition, I have tried using jstack to dump 
>> the threads, but the stack dump doesn't even list the thread with the 
>> thread PID from top.
>>  
>> Here are a couple of charts showing the cpu user percentage:
>>
>>
>> <https://lh5.googleusercontent.com/-Clcdm5Zh5Ps/Uw4YZLI6BmI/AAAAAAAAEVE/eYINhJP3ACo/s1600/Image.png>
>>  
>>
>> As you can see all the nodes went from 20% to 100% at around 3 PM. At 
>> midnight I got tired of waiting and restarted ES, one node at a time.
>>
>> The next chart is from some hours later:
>>
>>
>> <https://lh6.googleusercontent.com/-j5Fb3d-GxHU/Uw4YdvfNbaI/AAAAAAAAEVM/3c1g-ztRA18/s1600/Image.png>
>>  
>>
>> In this case the nodes' cpu usage increased at different points in time.
>>
>> Cpu iowait remains low (5-10%) the whole time.
>>
>> I'm thinking that maybe this behavior is triggered by large queries, but 
>> I don't have a specific test case that triggers it.
>>
>> So, what can I do to find out what is going on? Any help would be greatly 
>> appreciated!
>>
>> Regards,
>> Magnus Hyllander
>>  
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/6af1c79d-8402-4de6-9ec2-07893c6b54f2%40googlegroups.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/65c9dd2e-e441-4fe4-ada0-c517b55336f9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to