Yeah, already traced it back myself. Been using Elasticsearch for years and
I have been only setting query timeouts. Need to re-architect a way to
incorporate client-based timeouts.

Had two different elasticsearch meltdowns this weekend, after a long period
of stability. Both of them different and unique!

-- 
Ivan


On Mon, Jul 7, 2014 at 1:50 PM, [email protected] <[email protected]
> wrote:

> Yes, actionGet() can be traced down to AbstractQueueSynchronizer's
> acquireSharedInterruptibly(-1) call
>
>
> http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/AbstractQueuedSynchronizer.html#acquireSharedInterruptibly(int)
>
> in org.elasticsearch.common.util.concurrent.BaseFuture which "waits"
> forever until interrupted. But there are twin methods, like actionGet(long
> millis), that time out.
>
> Jörg
>
>
> On Mon, Jul 7, 2014 at 7:53 PM, Ivan Brusic <[email protected]> wrote:
>
>> Still analyzing all the logs and dumps that I have accumulated so far,
>> but it looks like the blocking socket appender might be the issue. After
>> that node exhausts all of its search threads, the TransportClient will
>> still issue requests to it, although other nodes do not have issues. After
>> a while, the client application will also be blocked waiting for
>> Elasticsearch to return.
>>
>> I removed logging for now, will re-implement it with a service that reads
>> directly from the duplicate file-based log. Although I have a timeout
>> specific for my query, my recollection of the search code is that it only
>> applies to the Lucene LimitedCollector (its been a while since I looked at
>> that code). The next step should be to add an explicit timeout
>> to actionGet(). Is the default basically no wait?
>>
>> It might be a challenge for the cluster engine to not delegate queries to
>> overloaded servers.
>>
>> Cheers,
>>
>> Ivan
>>
>>
>> On Sun, Jul 6, 2014 at 2:36 PM, [email protected] <
>> [email protected]> wrote:
>>
>>> Yes, socket appender blocks. Maybe the async appender of log4j can do
>>> better ...
>>>
>>> http://ricardozuasti.com/2009/asynchronous-logging-with-log4j/
>>>
>>> Jörg
>>>
>>>
>>> On Sun, Jul 6, 2014 at 11:22 PM, Ivan Brusic <[email protected]> wrote:
>>>
>>>> Forgot to mention the thread dumps. I have taken them before, but not
>>>> this time. Most of the block search thead pools are stuck in log4j.
>>>>
>>>> https://gist.github.com/brusic/fc12536d8e5706ec9c32
>>>>
>>>> I do have a socket appender to logstash (elasticsearch logs in
>>>> elasticsearch!). Let me debug this connection.
>>>>
>>>> --
>>>> Ivan
>>>>
>>>>
>>>> On Sun, Jul 6, 2014 at 1:55 PM, [email protected] <
>>>> [email protected]> wrote:
>>>>
>>>>> Can be anything seen in a thread dump what looks like stray queries?
>>>>> Maybe some facet queries hanged while resources went low and never
>>>>> returned?
>>>>>
>>>>> Jörg
>>>>>
>>>>>
>>>>> On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic <[email protected]> wrote:
>>>>>
>>>>>> Having an issue on one of my clusters running version 1.1.1 with 8
>>>>>> master/data nodes, unicast, connecting via the Java TransportClient. A 
>>>>>> few
>>>>>> REST queries are executed via monitoring services.
>>>>>>
>>>>>> Currently there is almost no traffic on this cluster. The few queries
>>>>>> that are currently running are either small test queries or large facet
>>>>>> queries (which are infrequent and the longest runs for 16 seconds). What 
>>>>>> I
>>>>>> am noticing is that the active search threads on some noded never 
>>>>>> decreases
>>>>>> and when it reaches the limit, the entire cluster will stop accepting
>>>>>> requests. The current max is the default (3 x 8).
>>>>>>
>>>>>> http://search06:9200/_cat/thread_pool
>>>>>>
>>>>>> search05 1.1.1.5 0 0 0 0 0 0 19 0 0
>>>>>> search07 1.1.1.7 0 0 0 0 0 0  0 0 0
>>>>>> search08 1.1.1.8 0 0 0 0 0 0  0 0 0
>>>>>> search09 1.1.1.9 0 0 0 0 0 0  0 0 0
>>>>>> search11 1.1.1.11 0 0 0 0 0 0  0 0 0
>>>>>> search06 1.1.1.6 0 0 0 0 0 0  2 0 0
>>>>>> search10 1.1.1.10 0 0 0 0 0 0  0 0 0
>>>>>> search12 1.1.1.12 0 0 0 0 0 0  0 0 0
>>>>>>
>>>>>> In this case, both search05 and search06 have an active thread count
>>>>>> that does not change. If I run a query against search05, the search will
>>>>>> respond quickly and the total number of active search threads does not
>>>>>> increase.
>>>>>>
>>>>>> So I have two related issues:
>>>>>> 1) the active thread count does not decrease
>>>>>> 2) the cluster will not accept requests if one node becomes unstable.
>>>>>>
>>>>>> I have seen the issue intermittently in the past, but the issue has
>>>>>> started again and cluster restarts does not fix the problem. At the log
>>>>>> level, there have been issues with the cluster state not propagating. Not
>>>>>> every node will acknowledge the cluster state ([discovery.zen.publish    
>>>>>> ]
>>>>>> received cluster state version NNN) and the master would log a timeout
>>>>>> (awaiting all nodes to process published state NNN timed out, timeout 
>>>>>> 30s).
>>>>>> The nodes are fine and I can ping each other with no issues. Currently 
>>>>>> not
>>>>>> seeing any log errors with the thread pool issue, so perhaps it is a red
>>>>>> herring.
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "elasticsearch" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com
>>>>>> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3%2Bxxu-yY_cE3Q-2mVvyzRW%3DTKq2GFJ_rnVSSOj-w%3DbA%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3%2Bxxu-yY_cE3Q-2mVvyzRW%3DTKq2GFJ_rnVSSOj-w%3DbA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>  To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB-%2BGB1U1c8cgxWDFdV_pmE53_kFe-R1C4AYktHbEHmfA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB-%2BGB1U1c8cgxWDFdV_pmE53_kFe-R1C4AYktHbEHmfA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>  To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFsQR8baTQNApQFgP2ofDihhN5895mz77LxDPObxM7fgg%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFsQR8baTQNApQFgP2ofDihhN5895mz77LxDPObxM7fgg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDVDNsG0RjmoHk3djiR-f1R8sWNnj4-Xe4XSBR6116eEQ%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDVDNsG0RjmoHk3djiR-f1R8sWNnj4-Xe4XSBR6116eEQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbYocet5UT-D5175aCrZTU-2o%3DuKtS6uz_di-LL-e_GA%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbYocet5UT-D5175aCrZTU-2o%3DuKtS6uz_di-LL-e_GA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQD5myr4s-TM5LmTKPd5ywG72ZpMmwegqKZcq3N-xkGWEQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to