Hello, we recently moved our ES cluster from dedicated hardware to AWS instances, they have less memory available, but use SSDs for the ES data directory. We kept JVM (1.7.0_17) and ES (0.90.9) version exactly the same. On the new hardware, after running a full re-index (creating a new index, pointing an alias to the new and one alias to the old index, sending realtime updates to both aliases and running a script to fill up the new index) our cluster gets stuck.
10 minutes after the re-index finishes and we move both aliases to the new index, ES stops answering any search or index queries, no errors in the logs apart from it not answering queries anymore: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution (queue capacity 1000) on org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$4@172018e5 CPU load is low, it doesn't look like it's doing anything expensive. A request to hot_threads times out. I've put the output from jstack and jmap here: https://gist.github.com/theflow/b983d512ea344545f7f6 We tried upgrading to 0.90.13, since the changelog mentioned a problem with infinite loops, but same behavior. We're planning to upgrade to a more recent version of ES soon, but it'll take a bit to fully test that. Any ideas what could be causing this? thanks, Florian -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7a347529-df1a-4a21-9ac1-d3af882a035a%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
