Hi
I have an ES cluster with 27 nodes (3 master, 24 data). At times I see a 
burst of nodes leaving and rejoining within couple of minutes. Each node 
has 16GB allocated for the JVM heap and are not close to touching those 
limits. There are no memory issues, and there is no search/index operations 
going on when this occurred. But there are quite a few nodedisconnected 
messages that suddenly appear on the master. It doesn’t seem to happen all 
the time but in bursts.

 

During this time, on the master, I see NodeDisconnectedException for a 
node. On that node, I see messages that say “master left (reason = 
transport disconnected)”. I don't think its split-brain though with the 
number of messages in the logs its hard to figure out. Also min number of 
master setting is set to 2. The outcome is that it causes a whole lot of 
shards to shift around.


I'd like to involve our network specialists to troubleshoot 
connectivity but not sure what to ask them to look for. In what scenarios 
does ElasticSearch reports node disconnected? Should they be looking at TCP 
connectivity, run some ping tests, etc.?

Also are there timeout values that can be configured so we can reduce false 
positives for node disconnected events?


Thanks

Darshat

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/e7ad5de3-0e9b-4496-9c96-5162b784bac1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to