Can be anything seen in a thread dump what looks like stray queries? Maybe some facet queries hanged while resources went low and never returned?
Jörg On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic <[email protected]> wrote: > Having an issue on one of my clusters running version 1.1.1 with 8 > master/data nodes, unicast, connecting via the Java TransportClient. A few > REST queries are executed via monitoring services. > > Currently there is almost no traffic on this cluster. The few queries that > are currently running are either small test queries or large facet queries > (which are infrequent and the longest runs for 16 seconds). What I am > noticing is that the active search threads on some noded never decreases > and when it reaches the limit, the entire cluster will stop accepting > requests. The current max is the default (3 x 8). > > http://search06:9200/_cat/thread_pool > > search05 1.1.1.5 0 0 0 0 0 0 19 0 0 > search07 1.1.1.7 0 0 0 0 0 0 0 0 0 > search08 1.1.1.8 0 0 0 0 0 0 0 0 0 > search09 1.1.1.9 0 0 0 0 0 0 0 0 0 > search11 1.1.1.11 0 0 0 0 0 0 0 0 0 > search06 1.1.1.6 0 0 0 0 0 0 2 0 0 > search10 1.1.1.10 0 0 0 0 0 0 0 0 0 > search12 1.1.1.12 0 0 0 0 0 0 0 0 0 > > In this case, both search05 and search06 have an active thread count that > does not change. If I run a query against search05, the search will respond > quickly and the total number of active search threads does not increase. > > So I have two related issues: > 1) the active thread count does not decrease > 2) the cluster will not accept requests if one node becomes unstable. > > I have seen the issue intermittently in the past, but the issue has > started again and cluster restarts does not fix the problem. At the log > level, there have been issues with the cluster state not propagating. Not > every node will acknowledge the cluster state ([discovery.zen.publish ] > received cluster state version NNN) and the master would log a timeout > (awaiting all nodes to process published state NNN timed out, timeout 30s). > The nodes are fine and I can ping each other with no issues. Currently not > seeing any log errors with the thread pool issue, so perhaps it is a red > herring. > > Cheers, > > Ivan > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3%2Bxxu-yY_cE3Q-2mVvyzRW%3DTKq2GFJ_rnVSSOj-w%3DbA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
