Try looking into hot_threads?

On 13 February 2015 at 14:41, liu wei <[email protected]> wrote:

> Thanks for the reply. Today the problem happened again. A bad node stop
> responding and bring down the whole cluster. But this time memory is ok.
> Here are some details.
> 1. Again, management APIs such as _node, cat are not returning. Default
> the default 9200 response. If I directly hit the master node, default 9200
> is returning 200. But the other APIs are not working.
> 2. No out of memory exception. We set HEAP at 20GB, the but the usage is
> about 15GB only. ( Could it be because of this? Machine is 32GB memory)
> 3. I restarted a couple of high memory nodes, and master too, still not
> recovering. Until I found on some master node logs pointing to a node
> saying operation cannot be executed on bad node.
> 4. Again, the bad node's log is missing an entire time period since a
> couple of hours ago. And in Marvel, the node stopped reporting status
> around the same time too. Didn't see anything suspicious on Marvel events
> though. Unlike the first time, there's no obvious problem(didn't see GC
> log) except some index operation failing. And this time I checked the
> field_data size too, it's not big, around 1GB only.
>
> What can i do to pinpoint what's going on?
>
>
> On Sun, Feb 8, 2015 at 11:11 AM, Adrien Grand <
> [email protected]> wrote:
>
>> Indeed JVMs sometimes need to "stop the world" in case of memory
>> pressure. You might find some advices about GC tuning here or there but
>> these but I would advise to avoid it as it is very hard to evaluate the
>> impact of these settings.
>>
>> If this issue happens on a regular basis, it might mean that your cluster
>> is undersized and should be given more memory so that the JVM doesn't have
>> to run full GCs so often. Otherwise, you should look at how you could
>> modify elasticsearch's configuration in order to load less stuff in memory
>> (such as using doc values for fielddata). Another option is to run two
>> nodes instead of one per machine (with half the memory). Given that full
>> gcs are shorter on small heaps, this should limit the issue.
>>
>> On Sat, Feb 7, 2015 at 2:55 AM, liu wei <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> We recently had a few incidents where a single index with low memory is
>>> impacting the entire cluster. All the cluster related APIs are not
>>> responding. Kibana 3 and 4 are failing to load too. From log it seems it's
>>> doing GC and not responding to any requests. And there's no log between
>>> 2:29 to 4:07 where i restarted the node. Is there anyway to make this more
>>> resilient?
>>>
>>> *[2015-02-05 14:29:17,199][INFO ][monitor.jvm              ] [Big Wheel]
>>> [gc][young][78379][36567] duration [864ms], collections [1]/[1.7s], total
>>> [864ms]/[1.4h], memory [15.2gb]->[14.6gb]/[19.9gb], all_pools {[young]
>>> [599.8mb]->[2.8mb]/[665.6mb]}{[survivor] [75.6mb]->[83.1mb]/[83.1mb]}{[old]
>>> [14.5gb]->[14.5gb]/[19.1gb]}*
>>>
>>> *[2015-02-05 14:29:23,302][WARN ][monitor.jvm              ] [Big Wheel]
>>> [gc][young][78384][36568] duration [1.4s], collections [1]/[2s], total
>>> [1.4s]/[1.4h], memory [15.1gb]->[14.7gb]/[19.9gb], all_pools {[young]
>>> [459.7mb]->[15.7mb]/[665.6mb]}{[survivor]
>>> [83.1mb]->[83.1mb]/[83.1mb]}{[old] [14.5gb]->[14.6gb]/[19.1gb]}*
>>>
>>> *[2015-02-05 14:29:34,990][INFO ][monitor.jvm              ] [Big Wheel]
>>> [gc][young][78395][36571] duration [900ms], collections [1]/[1.4s], total
>>> [900ms]/[1.4h], memory [15.1gb]->[14.6gb]/[19.9gb], all_pools {[young]
>>> [484.9mb]->[3.9mb]/[665.6mb]}{[survivor] [71.7mb]->[52.4mb]/[83.1mb]}{[old]
>>> [14.6gb]->[14.6gb]/[19.1gb]}*
>>>
>>> *[2015-02-05 14:29:45,055**][WARN ][monitor.jvm              ] [Big
>>> Wheel] [gc][young][78404][36574] duration [1.2s], collections [1]/[2s],
>>> total [1.2s]/[1.4h], memory [15.1gb]->[14.7gb]/[19.9gb], all_pools {[young]
>>> [472.8mb]->[2.9mb]/[665.6mb]}{[survivor] [83.1mb]->[67.6mb]/[83.1mb]}{[old]
>>> [14.6gb]->[14.6gb]/[19.1gb]}*
>>>
>>> *[2015-02-05 16:07:15,509][**INFO ][node                     ] [Pyro]
>>> version[1.4.2], pid[9796], build[927caff/2014-12-16T14:11:12Z]*
>>>
>>> *[2015-02-05 16:07:15,510][INFO ][node                     ] [Pyro]
>>> initializing ...*
>>>
>>> *[2015-02-05 16:07:15,638][INFO ][plugins                  ] [Pyro]
>>> loaded [marvel, cloud-azure], sites [marvel, kopf]*
>>>
>>> *[2015-02-05 16:07:24,844][INFO ][node                     ] [Pyro]
>>> initialized*
>>>
>>> *[2015-02-05 16:07:24,845][INFO ][node                     ] [Pyro]
>>> starting ...*
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/39c89f64-8614-4c4d-bef4-420a5a9eae46%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/39c89f64-8614-4c4d-bef4-420a5a9eae46%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Adrien Grand
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/elasticsearch/O9pkFK5eMJ0/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7VWW%3DB9hYsR95yHT1fcPyna29c9KrOOWcpV_qwEDkUUQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7VWW%3DB9hYsR95yHT1fcPyna29c9KrOOWcpV_qwEDkUUQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAFzuQNRnXr3nq-tfGtq4zoJpYPr2A7dxP%2B2CiEFgwsLR9A%2BS-w%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAFzuQNRnXr3nq-tfGtq4zoJpYPr2A7dxP%2B2CiEFgwsLR9A%2BS-w%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8smievA8%3D5-Sm7xGwpsG3RfJvH8-A0inVHkOtq%3Dn15Vg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to