Yes, actionGet() can be traced down to AbstractQueueSynchronizer's acquireSharedInterruptibly(-1) call
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/AbstractQueuedSynchronizer.html#acquireSharedInterruptibly(int) in org.elasticsearch.common.util.concurrent.BaseFuture which "waits" forever until interrupted. But there are twin methods, like actionGet(long millis), that time out. Jörg On Mon, Jul 7, 2014 at 7:53 PM, Ivan Brusic <[email protected]> wrote: > Still analyzing all the logs and dumps that I have accumulated so far, but > it looks like the blocking socket appender might be the issue. After that > node exhausts all of its search threads, the TransportClient will still > issue requests to it, although other nodes do not have issues. After a > while, the client application will also be blocked waiting for > Elasticsearch to return. > > I removed logging for now, will re-implement it with a service that reads > directly from the duplicate file-based log. Although I have a timeout > specific for my query, my recollection of the search code is that it only > applies to the Lucene LimitedCollector (its been a while since I looked at > that code). The next step should be to add an explicit timeout > to actionGet(). Is the default basically no wait? > > It might be a challenge for the cluster engine to not delegate queries to > overloaded servers. > > Cheers, > > Ivan > > > On Sun, Jul 6, 2014 at 2:36 PM, [email protected] < > [email protected]> wrote: > >> Yes, socket appender blocks. Maybe the async appender of log4j can do >> better ... >> >> http://ricardozuasti.com/2009/asynchronous-logging-with-log4j/ >> >> Jörg >> >> >> On Sun, Jul 6, 2014 at 11:22 PM, Ivan Brusic <[email protected]> wrote: >> >>> Forgot to mention the thread dumps. I have taken them before, but not >>> this time. Most of the block search thead pools are stuck in log4j. >>> >>> https://gist.github.com/brusic/fc12536d8e5706ec9c32 >>> >>> I do have a socket appender to logstash (elasticsearch logs in >>> elasticsearch!). Let me debug this connection. >>> >>> -- >>> Ivan >>> >>> >>> On Sun, Jul 6, 2014 at 1:55 PM, [email protected] < >>> [email protected]> wrote: >>> >>>> Can be anything seen in a thread dump what looks like stray queries? >>>> Maybe some facet queries hanged while resources went low and never >>>> returned? >>>> >>>> Jörg >>>> >>>> >>>> On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic <[email protected]> wrote: >>>> >>>>> Having an issue on one of my clusters running version 1.1.1 with 8 >>>>> master/data nodes, unicast, connecting via the Java TransportClient. A few >>>>> REST queries are executed via monitoring services. >>>>> >>>>> Currently there is almost no traffic on this cluster. The few queries >>>>> that are currently running are either small test queries or large facet >>>>> queries (which are infrequent and the longest runs for 16 seconds). What I >>>>> am noticing is that the active search threads on some noded never >>>>> decreases >>>>> and when it reaches the limit, the entire cluster will stop accepting >>>>> requests. The current max is the default (3 x 8). >>>>> >>>>> http://search06:9200/_cat/thread_pool >>>>> >>>>> search05 1.1.1.5 0 0 0 0 0 0 19 0 0 >>>>> search07 1.1.1.7 0 0 0 0 0 0 0 0 0 >>>>> search08 1.1.1.8 0 0 0 0 0 0 0 0 0 >>>>> search09 1.1.1.9 0 0 0 0 0 0 0 0 0 >>>>> search11 1.1.1.11 0 0 0 0 0 0 0 0 0 >>>>> search06 1.1.1.6 0 0 0 0 0 0 2 0 0 >>>>> search10 1.1.1.10 0 0 0 0 0 0 0 0 0 >>>>> search12 1.1.1.12 0 0 0 0 0 0 0 0 0 >>>>> >>>>> In this case, both search05 and search06 have an active thread count >>>>> that does not change. If I run a query against search05, the search will >>>>> respond quickly and the total number of active search threads does not >>>>> increase. >>>>> >>>>> So I have two related issues: >>>>> 1) the active thread count does not decrease >>>>> 2) the cluster will not accept requests if one node becomes unstable. >>>>> >>>>> I have seen the issue intermittently in the past, but the issue has >>>>> started again and cluster restarts does not fix the problem. At the log >>>>> level, there have been issues with the cluster state not propagating. Not >>>>> every node will acknowledge the cluster state ([discovery.zen.publish ] >>>>> received cluster state version NNN) and the master would log a timeout >>>>> (awaiting all nodes to process published state NNN timed out, timeout >>>>> 30s). >>>>> The nodes are fine and I can ping each other with no issues. Currently not >>>>> seeing any log errors with the thread pool issue, so perhaps it is a red >>>>> herring. >>>>> >>>>> Cheers, >>>>> >>>>> Ivan >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3%2Bxxu-yY_cE3Q-2mVvyzRW%3DTKq2GFJ_rnVSSOj-w%3DbA%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3%2Bxxu-yY_cE3Q-2mVvyzRW%3DTKq2GFJ_rnVSSOj-w%3DbA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB-%2BGB1U1c8cgxWDFdV_pmE53_kFe-R1C4AYktHbEHmfA%40mail.gmail.com >>> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB-%2BGB1U1c8cgxWDFdV_pmE53_kFe-R1C4AYktHbEHmfA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFsQR8baTQNApQFgP2ofDihhN5895mz77LxDPObxM7fgg%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFsQR8baTQNApQFgP2ofDihhN5895mz77LxDPObxM7fgg%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDVDNsG0RjmoHk3djiR-f1R8sWNnj4-Xe4XSBR6116eEQ%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDVDNsG0RjmoHk3djiR-f1R8sWNnj4-Xe4XSBR6116eEQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbYocet5UT-D5175aCrZTU-2o%3DuKtS6uz_di-LL-e_GA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
