We're using ES 1.1.0 for central logging storage/searching. When we use 
Kibana and search a month's worth of data, our cluster becomes 
unresponsive. By unresponsive I mean that many nodes will respond 
immediately to a 'curl localhost:9200' but a couple will not. This leads to 
any cluster metrics not being available when quering the master and we're 
unable to set any cluster-level settings.

We're getting a these types of errors in the logs:
[2014-05-05 19:10:50,763][WARN ][transport.netty          ] [Leap-Frog] 
exception caught on transport layer [[id: 0x4b074069, /10.6.10.211:57563 => 
/10.6.10.148:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space

The cluster seems to never recover either - and that is my biggest concern. 
So my questions are:
1. Is it normal for the entire cluster to just close up shop because a 
couple nodes are unresponsive? I thought the field data circuit breaker 
would fix this, but maybe this is a different problem.
2. How to best get ES to recover from this scenario? I dont really want to 
restart just the two nodes, as we have >1Tb of data on each node, but 
issuing a disable_allocation fails because it cannot write to all nodes in 
the cluster

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fb4e427-cf95-4882-bd87-728fbfef10dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to