Yeah, already traced it back myself. Been using Elasticsearch for years and I have been only setting query timeouts. Need to re-architect a way to incorporate client-based timeouts.
Had two different elasticsearch meltdowns this weekend, after a long period of stability. Both of them different and unique! -- Ivan On Mon, Jul 7, 2014 at 1:50 PM, [email protected] <[email protected] > wrote: > Yes, actionGet() can be traced down to AbstractQueueSynchronizer's > acquireSharedInterruptibly(-1) call > > > http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/AbstractQueuedSynchronizer.html#acquireSharedInterruptibly(int) > > in org.elasticsearch.common.util.concurrent.BaseFuture which "waits" > forever until interrupted. But there are twin methods, like actionGet(long > millis), that time out. > > Jörg > > > On Mon, Jul 7, 2014 at 7:53 PM, Ivan Brusic <[email protected]> wrote: > >> Still analyzing all the logs and dumps that I have accumulated so far, >> but it looks like the blocking socket appender might be the issue. After >> that node exhausts all of its search threads, the TransportClient will >> still issue requests to it, although other nodes do not have issues. After >> a while, the client application will also be blocked waiting for >> Elasticsearch to return. >> >> I removed logging for now, will re-implement it with a service that reads >> directly from the duplicate file-based log. Although I have a timeout >> specific for my query, my recollection of the search code is that it only >> applies to the Lucene LimitedCollector (its been a while since I looked at >> that code). The next step should be to add an explicit timeout >> to actionGet(). Is the default basically no wait? >> >> It might be a challenge for the cluster engine to not delegate queries to >> overloaded servers. >> >> Cheers, >> >> Ivan >> >> >> On Sun, Jul 6, 2014 at 2:36 PM, [email protected] < >> [email protected]> wrote: >> >>> Yes, socket appender blocks. Maybe the async appender of log4j can do >>> better ... >>> >>> http://ricardozuasti.com/2009/asynchronous-logging-with-log4j/ >>> >>> Jörg >>> >>> >>> On Sun, Jul 6, 2014 at 11:22 PM, Ivan Brusic <[email protected]> wrote: >>> >>>> Forgot to mention the thread dumps. I have taken them before, but not >>>> this time. Most of the block search thead pools are stuck in log4j. >>>> >>>> https://gist.github.com/brusic/fc12536d8e5706ec9c32 >>>> >>>> I do have a socket appender to logstash (elasticsearch logs in >>>> elasticsearch!). Let me debug this connection. >>>> >>>> -- >>>> Ivan >>>> >>>> >>>> On Sun, Jul 6, 2014 at 1:55 PM, [email protected] < >>>> [email protected]> wrote: >>>> >>>>> Can be anything seen in a thread dump what looks like stray queries? >>>>> Maybe some facet queries hanged while resources went low and never >>>>> returned? >>>>> >>>>> Jörg >>>>> >>>>> >>>>> On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic <[email protected]> wrote: >>>>> >>>>>> Having an issue on one of my clusters running version 1.1.1 with 8 >>>>>> master/data nodes, unicast, connecting via the Java TransportClient. A >>>>>> few >>>>>> REST queries are executed via monitoring services. >>>>>> >>>>>> Currently there is almost no traffic on this cluster. The few queries >>>>>> that are currently running are either small test queries or large facet >>>>>> queries (which are infrequent and the longest runs for 16 seconds). What >>>>>> I >>>>>> am noticing is that the active search threads on some noded never >>>>>> decreases >>>>>> and when it reaches the limit, the entire cluster will stop accepting >>>>>> requests. The current max is the default (3 x 8). >>>>>> >>>>>> http://search06:9200/_cat/thread_pool >>>>>> >>>>>> search05 1.1.1.5 0 0 0 0 0 0 19 0 0 >>>>>> search07 1.1.1.7 0 0 0 0 0 0 0 0 0 >>>>>> search08 1.1.1.8 0 0 0 0 0 0 0 0 0 >>>>>> search09 1.1.1.9 0 0 0 0 0 0 0 0 0 >>>>>> search11 1.1.1.11 0 0 0 0 0 0 0 0 0 >>>>>> search06 1.1.1.6 0 0 0 0 0 0 2 0 0 >>>>>> search10 1.1.1.10 0 0 0 0 0 0 0 0 0 >>>>>> search12 1.1.1.12 0 0 0 0 0 0 0 0 0 >>>>>> >>>>>> In this case, both search05 and search06 have an active thread count >>>>>> that does not change. If I run a query against search05, the search will >>>>>> respond quickly and the total number of active search threads does not >>>>>> increase. >>>>>> >>>>>> So I have two related issues: >>>>>> 1) the active thread count does not decrease >>>>>> 2) the cluster will not accept requests if one node becomes unstable. >>>>>> >>>>>> I have seen the issue intermittently in the past, but the issue has >>>>>> started again and cluster restarts does not fix the problem. At the log >>>>>> level, there have been issues with the cluster state not propagating. Not >>>>>> every node will acknowledge the cluster state ([discovery.zen.publish >>>>>> ] >>>>>> received cluster state version NNN) and the master would log a timeout >>>>>> (awaiting all nodes to process published state NNN timed out, timeout >>>>>> 30s). >>>>>> The nodes are fine and I can ping each other with no issues. Currently >>>>>> not >>>>>> seeing any log errors with the thread pool issue, so perhaps it is a red >>>>>> herring. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Ivan >>>>>> >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "elasticsearch" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To view this discussion on the web visit >>>>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com >>>>>> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>> . >>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "elasticsearch" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3%2Bxxu-yY_cE3Q-2mVvyzRW%3DTKq2GFJ_rnVSSOj-w%3DbA%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3%2Bxxu-yY_cE3Q-2mVvyzRW%3DTKq2GFJ_rnVSSOj-w%3DbA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "elasticsearch" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB-%2BGB1U1c8cgxWDFdV_pmE53_kFe-R1C4AYktHbEHmfA%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB-%2BGB1U1c8cgxWDFdV_pmE53_kFe-R1C4AYktHbEHmfA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFsQR8baTQNApQFgP2ofDihhN5895mz77LxDPObxM7fgg%40mail.gmail.com >>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFsQR8baTQNApQFgP2ofDihhN5895mz77LxDPObxM7fgg%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDVDNsG0RjmoHk3djiR-f1R8sWNnj4-Xe4XSBR6116eEQ%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDVDNsG0RjmoHk3djiR-f1R8sWNnj4-Xe4XSBR6116eEQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbYocet5UT-D5175aCrZTU-2o%3DuKtS6uz_di-LL-e_GA%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbYocet5UT-D5175aCrZTU-2o%3DuKtS6uz_di-LL-e_GA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD5myr4s-TM5LmTKPd5ywG72ZpMmwegqKZcq3N-xkGWEQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
