Slow cluster startup with zen discovery and large number of nodes.

Michel Conrad Fri, 21 Feb 2014 06:41:00 -0800

Hi,

Starting a cluster with 100 nodes takes half an hour just for the
nodes to join in elasticsearch version 1.0. In version 0.19.8 nodes
were very quick to join the cluster. The issue seems to come from the
master node sending the updated state to all the nodes in the cluster
after every single addition of a node and then waiting for the nodes
to acknowledge the cluster update before adding the next node
(zen-disco-receive).


Setting discovery.zen.publish_timeout:0 seems to resolve the issue
during startup, because the master node does not block anymore, but I
am not sure if something can go wrong afterwards while running the
cluster with the timeout set to 0.

I also tried setting increasing the kernel connections, but it did not
make a difference:
sysctl -w net.ipv4.tcp_max_syn_backlog=20480
sysctl -w net.core.somaxconn=8192
sysctl -w net.ipv4.tcp_syncookies=1
sysctl -w net.ipv4.tcp_synack_retries=1

So the question would be if it is safe to run the cluster with
discovery.zen.publish_timeout set to 0 and if the behavior is to be
expected that zen discovery does not perform well for a larger number
of nodes? Or if there might still be something wrong with the setup?

Thanks in Advance,
Michel

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH0sEYisBxUpRksMbYox06sbONjtOPtRbc%2Btqzzg%2Bu5%3DVrbrcw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Slow cluster startup with zen discovery and large number of nodes.

Reply via email to