We're using ES 1.1.0 for central logging storage/searching. When we use Kibana and search a month's worth of data, our cluster becomes unresponsive. By unresponsive I mean that many nodes will respond immediately to a 'curl localhost:9200' but a couple will not. This leads to any cluster metrics not being available when quering the master and we're unable to set any cluster-level settings.
We're getting a these types of errors in the logs: [2014-05-05 19:10:50,763][WARN ][transport.netty ] [Leap-Frog] exception caught on transport layer [[id: 0x4b074069, /10.6.10.211:57563 => /10.6.10.148:9300]], closing connection java.lang.OutOfMemoryError: Java heap space The cluster seems to never recover either - and that is my biggest concern. So my questions are: 1. Is it normal for the entire cluster to just close up shop because a couple nodes are unresponsive? I thought the field data circuit breaker would fix this, but maybe this is a different problem. 2. How to best get ES to recover from this scenario? I dont really want to restart just the two nodes, as we have >1Tb of data on each node, but issuing a disable_allocation fails because it cannot write to all nodes in the cluster -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2fb4e427-cf95-4882-bd87-728fbfef10dd%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
