Try looking into hot_threads? On 13 February 2015 at 14:41, liu wei <[email protected]> wrote:
> Thanks for the reply. Today the problem happened again. A bad node stop > responding and bring down the whole cluster. But this time memory is ok. > Here are some details. > 1. Again, management APIs such as _node, cat are not returning. Default > the default 9200 response. If I directly hit the master node, default 9200 > is returning 200. But the other APIs are not working. > 2. No out of memory exception. We set HEAP at 20GB, the but the usage is > about 15GB only. ( Could it be because of this? Machine is 32GB memory) > 3. I restarted a couple of high memory nodes, and master too, still not > recovering. Until I found on some master node logs pointing to a node > saying operation cannot be executed on bad node. > 4. Again, the bad node's log is missing an entire time period since a > couple of hours ago. And in Marvel, the node stopped reporting status > around the same time too. Didn't see anything suspicious on Marvel events > though. Unlike the first time, there's no obvious problem(didn't see GC > log) except some index operation failing. And this time I checked the > field_data size too, it's not big, around 1GB only. > > What can i do to pinpoint what's going on? > > > On Sun, Feb 8, 2015 at 11:11 AM, Adrien Grand < > [email protected]> wrote: > >> Indeed JVMs sometimes need to "stop the world" in case of memory >> pressure. You might find some advices about GC tuning here or there but >> these but I would advise to avoid it as it is very hard to evaluate the >> impact of these settings. >> >> If this issue happens on a regular basis, it might mean that your cluster >> is undersized and should be given more memory so that the JVM doesn't have >> to run full GCs so often. Otherwise, you should look at how you could >> modify elasticsearch's configuration in order to load less stuff in memory >> (such as using doc values for fielddata). Another option is to run two >> nodes instead of one per machine (with half the memory). Given that full >> gcs are shorter on small heaps, this should limit the issue. >> >> On Sat, Feb 7, 2015 at 2:55 AM, liu wei <[email protected]> wrote: >> >>> Hi, >>> >>> We recently had a few incidents where a single index with low memory is >>> impacting the entire cluster. All the cluster related APIs are not >>> responding. Kibana 3 and 4 are failing to load too. From log it seems it's >>> doing GC and not responding to any requests. And there's no log between >>> 2:29 to 4:07 where i restarted the node. Is there anyway to make this more >>> resilient? >>> >>> *[2015-02-05 14:29:17,199][INFO ][monitor.jvm ] [Big Wheel] >>> [gc][young][78379][36567] duration [864ms], collections [1]/[1.7s], total >>> [864ms]/[1.4h], memory [15.2gb]->[14.6gb]/[19.9gb], all_pools {[young] >>> [599.8mb]->[2.8mb]/[665.6mb]}{[survivor] [75.6mb]->[83.1mb]/[83.1mb]}{[old] >>> [14.5gb]->[14.5gb]/[19.1gb]}* >>> >>> *[2015-02-05 14:29:23,302][WARN ][monitor.jvm ] [Big Wheel] >>> [gc][young][78384][36568] duration [1.4s], collections [1]/[2s], total >>> [1.4s]/[1.4h], memory [15.1gb]->[14.7gb]/[19.9gb], all_pools {[young] >>> [459.7mb]->[15.7mb]/[665.6mb]}{[survivor] >>> [83.1mb]->[83.1mb]/[83.1mb]}{[old] [14.5gb]->[14.6gb]/[19.1gb]}* >>> >>> *[2015-02-05 14:29:34,990][INFO ][monitor.jvm ] [Big Wheel] >>> [gc][young][78395][36571] duration [900ms], collections [1]/[1.4s], total >>> [900ms]/[1.4h], memory [15.1gb]->[14.6gb]/[19.9gb], all_pools {[young] >>> [484.9mb]->[3.9mb]/[665.6mb]}{[survivor] [71.7mb]->[52.4mb]/[83.1mb]}{[old] >>> [14.6gb]->[14.6gb]/[19.1gb]}* >>> >>> *[2015-02-05 14:29:45,055**][WARN ][monitor.jvm ] [Big >>> Wheel] [gc][young][78404][36574] duration [1.2s], collections [1]/[2s], >>> total [1.2s]/[1.4h], memory [15.1gb]->[14.7gb]/[19.9gb], all_pools {[young] >>> [472.8mb]->[2.9mb]/[665.6mb]}{[survivor] [83.1mb]->[67.6mb]/[83.1mb]}{[old] >>> [14.6gb]->[14.6gb]/[19.1gb]}* >>> >>> *[2015-02-05 16:07:15,509][**INFO ][node ] [Pyro] >>> version[1.4.2], pid[9796], build[927caff/2014-12-16T14:11:12Z]* >>> >>> *[2015-02-05 16:07:15,510][INFO ][node ] [Pyro] >>> initializing ...* >>> >>> *[2015-02-05 16:07:15,638][INFO ][plugins ] [Pyro] >>> loaded [marvel, cloud-azure], sites [marvel, kopf]* >>> >>> *[2015-02-05 16:07:24,844][INFO ][node ] [Pyro] >>> initialized* >>> >>> *[2015-02-05 16:07:24,845][INFO ][node ] [Pyro] >>> starting ...* >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/39c89f64-8614-4c4d-bef4-420a5a9eae46%40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/39c89f64-8614-4c4d-bef4-420a5a9eae46%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> >> >> -- >> Adrien Grand >> >> -- >> You received this message because you are subscribed to a topic in the >> Google Groups "elasticsearch" group. >> To unsubscribe from this topic, visit >> https://groups.google.com/d/topic/elasticsearch/O9pkFK5eMJ0/unsubscribe. >> To unsubscribe from this group and all its topics, send an email to >> [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7VWW%3DB9hYsR95yHT1fcPyna29c9KrOOWcpV_qwEDkUUQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j7VWW%3DB9hYsR95yHT1fcPyna29c9KrOOWcpV_qwEDkUUQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAFzuQNRnXr3nq-tfGtq4zoJpYPr2A7dxP%2B2CiEFgwsLR9A%2BS-w%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAFzuQNRnXr3nq-tfGtq4zoJpYPr2A7dxP%2B2CiEFgwsLR9A%2BS-w%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8smievA8%3D5-Sm7xGwpsG3RfJvH8-A0inVHkOtq%3Dn15Vg%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
