I currently have an elasticsearch cluster with 7 nodes.  Some of the 
connectivity between nodes is across fiber with 2 - 3ms latency between 
nodes.  About once a day we see a node drop from the cluster, a new master 
is elected, and then the dropped node returns to the cluster usually 30 - 
45 seconds later.  The configuration on all nodes has been tweaked as 
follows to help tolerate the slight increase in latency but still seems to 
get a timeout when they drop.  Is it expected that even 2ms of latency 
would cause issues with the cluster?  If so, is there further configuration 
needed to make the cluster more tolerant of the latency?  Or should this 
latency be expected and I should investigate other root causes for the 
nodes dropping occasionally?  I've confirmed that we're never actually 
dropping packets between nodes, so something is going on that is causing 
them to not respond 5x60s pings.

zen-disco-node_failed([CDPX-PRD-ELS4][lkquUBfHT1aXAO3-_tCNCg][cdpx-prd-els4][inet[10.9.64.142/10.9.64.142:9300]]{master=false}),
 
reason failed to ping, tried [5] times, each with maximum [1m] timeout

discovery.zen.fd.ping_interval: 15s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 5


-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dd49b82c-5496-48fd-8c8b-c47a42bb6d21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to