I currently have an elasticsearch cluster with 7 nodes. Some of the
connectivity between nodes is across fiber with 2 - 3ms latency between
nodes. About once a day we see a node drop from the cluster, a new master
is elected, and then the dropped node returns to the cluster usually 30 -
45 seconds later. The configuration on all nodes has been tweaked as
follows to help tolerate the slight increase in latency but still seems to
get a timeout when they drop. Is it expected that even 2ms of latency
would cause issues with the cluster? If so, is there further configuration
needed to make the cluster more tolerant of the latency? Or should this
latency be expected and I should investigate other root causes for the
nodes dropping occasionally? I've confirmed that we're never actually
dropping packets between nodes, so something is going on that is causing
them to not respond 5x60s pings.
zen-disco-node_failed([CDPX-PRD-ELS4][lkquUBfHT1aXAO3-_tCNCg][cdpx-prd-els4][inet[10.9.64.142/10.9.64.142:9300]]{master=false}),
reason failed to ping, tried [5] times, each with maximum [1m] timeout
discovery.zen.fd.ping_interval: 15s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 5
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/dd49b82c-5496-48fd-8c8b-c47a42bb6d21%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.