Cluster gets stuck after full re-index

Florian Munz Tue, 03 Jun 2014 03:14:30 -0700

Hello,

we recently moved our ES cluster from dedicated hardware to AWS instances, 
they have less memory available, but use SSDs for the ES data directory. We 
kept JVM (1.7.0_17) and ES (0.90.9) version exactly the same. On the new 
hardware, after running a full re-index (creating a new index, pointing an 
alias to the new and one alias to the old index, sending realtime updates 
to both aliases and running a script to fill up the new index) our cluster 
gets stuck.


10 minutes after the re-index finishes and we move both aliases to the new 
index, ES stops answering any search or index queries, no errors in the 
logs apart from it not answering queries anymore:

org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: 
rejected execution (queue capacity 1000) on 
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@172018e5

CPU load is low, it doesn't look like it's doing anything expensive. A 
request to hot_threads times out. I've put the output from jstack and jmap 
here:

https://gist.github.com/theflow/b983d512ea344545f7f6

We tried upgrading to 0.90.13, since the changelog mentioned a problem with 
infinite loops, but same behavior. We're planning to upgrade to a more 
recent version of ES soon, but it'll take a bit to fully test that.


Any ideas what could be causing this?


thanks,
Florian

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/7a347529-df1a-4a21-9ac1-d3af882a035a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cluster gets stuck after full re-index

Reply via email to