Hi, Starting a cluster with 100 nodes takes half an hour just for the nodes to join in elasticsearch version 1.0. In version 0.19.8 nodes were very quick to join the cluster. The issue seems to come from the master node sending the updated state to all the nodes in the cluster after every single addition of a node and then waiting for the nodes to acknowledge the cluster update before adding the next node (zen-disco-receive).
Setting discovery.zen.publish_timeout:0 seems to resolve the issue during startup, because the master node does not block anymore, but I am not sure if something can go wrong afterwards while running the cluster with the timeout set to 0. I also tried setting increasing the kernel connections, but it did not make a difference: sysctl -w net.ipv4.tcp_max_syn_backlog=20480 sysctl -w net.core.somaxconn=8192 sysctl -w net.ipv4.tcp_syncookies=1 sysctl -w net.ipv4.tcp_synack_retries=1 So the question would be if it is safe to run the cluster with discovery.zen.publish_timeout set to 0 and if the behavior is to be expected that zen discovery does not perform well for a larger number of nodes? Or if there might still be something wrong with the setup? Thanks in Advance, Michel -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAH0sEYisBxUpRksMbYox06sbONjtOPtRbc%2Btqzzg%2Bu5%3DVrbrcw%40mail.gmail.com. For more options, visit https://groups.google.com/groups/opt_out.
