What is the proper way of performing a rolling restart of a cluster? I currently have my stop script check for the cluster health to be green before stopping itself. Unfortunately this doesn't appear to be working.
My setup: ES 1.0.0 3 node cluster w/ 1 replica. When I perform the rolling restart I see the cluster still reporting a green state when a node is down. In theory that should be a yellow state since some shards will be unallocated. My script output during a rolling restart: 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 curl: (52) Empty reply from server 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 ... continues as green for many more seconds... Since it is reporting as green, the second node thinks it can stop and ends up putting the cluster into a broken red state: curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530 curl: (52) Empty reply from server curl: (52) Empty reply from server 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046 My stop script issues a call to http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node. Is it possible the other nodes are waiting to timeout the down node before moving into the yellow state? I would assume the shutdown API call would inform the other nodes that it is going down. Appreciate any help on how to do this properly. -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
