Yes, actionGet() can be traced down to AbstractQueueSynchronizer's
acquireSharedInterruptibly(-1) call

http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/AbstractQueuedSynchronizer.html#acquireSharedInterruptibly(int)

in org.elasticsearch.common.util.concurrent.BaseFuture which "waits"
forever until interrupted. But there are twin methods, like actionGet(long
millis), that time out.

Jörg


On Mon, Jul 7, 2014 at 7:53 PM, Ivan Brusic <[email protected]> wrote:

> Still analyzing all the logs and dumps that I have accumulated so far, but
> it looks like the blocking socket appender might be the issue. After that
> node exhausts all of its search threads, the TransportClient will still
> issue requests to it, although other nodes do not have issues. After a
> while, the client application will also be blocked waiting for
> Elasticsearch to return.
>
> I removed logging for now, will re-implement it with a service that reads
> directly from the duplicate file-based log. Although I have a timeout
> specific for my query, my recollection of the search code is that it only
> applies to the Lucene LimitedCollector (its been a while since I looked at
> that code). The next step should be to add an explicit timeout
> to actionGet(). Is the default basically no wait?
>
> It might be a challenge for the cluster engine to not delegate queries to
> overloaded servers.
>
> Cheers,
>
> Ivan
>
>
> On Sun, Jul 6, 2014 at 2:36 PM, [email protected] <
> [email protected]> wrote:
>
>> Yes, socket appender blocks. Maybe the async appender of log4j can do
>> better ...
>>
>> http://ricardozuasti.com/2009/asynchronous-logging-with-log4j/
>>
>> Jörg
>>
>>
>> On Sun, Jul 6, 2014 at 11:22 PM, Ivan Brusic <[email protected]> wrote:
>>
>>> Forgot to mention the thread dumps. I have taken them before, but not
>>> this time. Most of the block search thead pools are stuck in log4j.
>>>
>>> https://gist.github.com/brusic/fc12536d8e5706ec9c32
>>>
>>> I do have a socket appender to logstash (elasticsearch logs in
>>> elasticsearch!). Let me debug this connection.
>>>
>>> --
>>> Ivan
>>>
>>>
>>> On Sun, Jul 6, 2014 at 1:55 PM, [email protected] <
>>> [email protected]> wrote:
>>>
>>>> Can be anything seen in a thread dump what looks like stray queries?
>>>> Maybe some facet queries hanged while resources went low and never
>>>> returned?
>>>>
>>>> Jörg
>>>>
>>>>
>>>> On Sun, Jul 6, 2014 at 9:59 PM, Ivan Brusic <[email protected]> wrote:
>>>>
>>>>> Having an issue on one of my clusters running version 1.1.1 with 8
>>>>> master/data nodes, unicast, connecting via the Java TransportClient. A few
>>>>> REST queries are executed via monitoring services.
>>>>>
>>>>> Currently there is almost no traffic on this cluster. The few queries
>>>>> that are currently running are either small test queries or large facet
>>>>> queries (which are infrequent and the longest runs for 16 seconds). What I
>>>>> am noticing is that the active search threads on some noded never 
>>>>> decreases
>>>>> and when it reaches the limit, the entire cluster will stop accepting
>>>>> requests. The current max is the default (3 x 8).
>>>>>
>>>>> http://search06:9200/_cat/thread_pool
>>>>>
>>>>> search05 1.1.1.5 0 0 0 0 0 0 19 0 0
>>>>> search07 1.1.1.7 0 0 0 0 0 0  0 0 0
>>>>> search08 1.1.1.8 0 0 0 0 0 0  0 0 0
>>>>> search09 1.1.1.9 0 0 0 0 0 0  0 0 0
>>>>> search11 1.1.1.11 0 0 0 0 0 0  0 0 0
>>>>> search06 1.1.1.6 0 0 0 0 0 0  2 0 0
>>>>> search10 1.1.1.10 0 0 0 0 0 0  0 0 0
>>>>> search12 1.1.1.12 0 0 0 0 0 0  0 0 0
>>>>>
>>>>> In this case, both search05 and search06 have an active thread count
>>>>> that does not change. If I run a query against search05, the search will
>>>>> respond quickly and the total number of active search threads does not
>>>>> increase.
>>>>>
>>>>> So I have two related issues:
>>>>> 1) the active thread count does not decrease
>>>>> 2) the cluster will not accept requests if one node becomes unstable.
>>>>>
>>>>> I have seen the issue intermittently in the past, but the issue has
>>>>> started again and cluster restarts does not fix the problem. At the log
>>>>> level, there have been issues with the cluster state not propagating. Not
>>>>> every node will acknowledge the cluster state ([discovery.zen.publish    ]
>>>>> received cluster state version NNN) and the master would log a timeout
>>>>> (awaiting all nodes to process published state NNN timed out, timeout 
>>>>> 30s).
>>>>> The nodes are fine and I can ping each other with no issues. Currently not
>>>>> seeing any log errors with the thread pool issue, so perhaps it is a red
>>>>> herring.
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Ivan
>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "elasticsearch" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQCx91LEXP0NxbgC4-mVR27DX%2BuOxyor5cqiM6ie2JExBw%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3%2Bxxu-yY_cE3Q-2mVvyzRW%3DTKq2GFJ_rnVSSOj-w%3DbA%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoH3%2Bxxu-yY_cE3Q-2mVvyzRW%3DTKq2GFJ_rnVSSOj-w%3DbA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>  To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB-%2BGB1U1c8cgxWDFdV_pmE53_kFe-R1C4AYktHbEHmfA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB-%2BGB1U1c8cgxWDFdV_pmE53_kFe-R1C4AYktHbEHmfA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFsQR8baTQNApQFgP2ofDihhN5895mz77LxDPObxM7fgg%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAKdsXoFsQR8baTQNApQFgP2ofDihhN5895mz77LxDPObxM7fgg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDVDNsG0RjmoHk3djiR-f1R8sWNnj4-Xe4XSBR6116eEQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQDVDNsG0RjmoHk3djiR-f1R8sWNnj4-Xe4XSBR6116eEQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGbYocet5UT-D5175aCrZTU-2o%3DuKtS6uz_di-LL-e_GA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to