My scripts do a wait for yellow before waiting for green, because as you noticed, the cluster does not entering a yellow state immediately following a cluster (shutdown, replica change) event.
-- Ivan On Wed, Apr 2, 2014 at 11:08 AM, Mike Deeks <[email protected]> wrote: > That is exactly what I'm doing. For some reason the cluster reports as > green even though an entire node is down. The cluster doesn't seem to > notice the node is gone and change to yellow until many seconds later. By > then my rolling restart script has already gotten to the second node and > killed it because the cluster was still green for some reason. > > > On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote: > >> Mike, >> >> Your script needs to check for the status of the cluster before shutting >> down a node, ie if the state is yellow wait until it becomes green again >> before shutting down the next node. You'll probably want do disable >> allocation of shards while each node is being restarted (enable when node >> comes back) in order to minimize the amount of data that needs to be >> rebalanced. >> Also make sure to have 'discovery.zen.minimum_master_nodes' correctly >> set in your elasticsearch.yml file. >> >> Meta code >> >> for node in $cluster_nodes; do >> if [ $cluster_status == 'green' ]; then >> cluster_disable_allocation() >> shutdown_node($node) >> wait_for_node_to_rejoin() >> cluster_enable_allocation() >> wait_for_cluster_status_green() >> fi >> done >> >> http://www.elasticsearch.org/guide/en/elasticsearch/ >> reference/current/modules-cluster.html >> >> /petter >> >> >> On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks <[email protected]> wrote: >> >>> What is the proper way of performing a rolling restart of a cluster? I >>> currently have my stop script check for the cluster health to be green >>> before stopping itself. Unfortunately this doesn't appear to be working. >>> >>> My setup: >>> ES 1.0.0 >>> 3 node cluster w/ 1 replica. >>> >>> When I perform the rolling restart I see the cluster still reporting a >>> green state when a node is down. In theory that should be a yellow state >>> since some shards will be unallocated. My script output during a rolling >>> restart: >>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 >>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 >>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0 >>> >>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 >>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 >>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0 >>> >>> curl: (52) Empty reply from server >>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 >>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0 >>> >>> curl: (52) Empty reply from server >>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 >>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0 >>> ... continues as green for many more seconds... >>> >>> Since it is reporting as green, the second node thinks it can stop and >>> ends up putting the cluster into a broken red state: >>> curl: (52) Empty reply from server >>> curl: (52) Empty reply from server >>> 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0 >>> >>> curl: (52) Empty reply from server >>> curl: (52) Empty reply from server >>> 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530 >>> >>> curl: (52) Empty reply from server >>> curl: (52) Empty reply from server >>> 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530 >>> >>> curl: (52) Empty reply from server >>> curl: (52) Empty reply from server >>> 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530 >>> >>> curl: (52) Empty reply from server >>> curl: (52) Empty reply from server >>> 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530 >>> >>> curl: (52) Empty reply from server >>> curl: (52) Empty reply from server >>> 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530 >>> >>> curl: (52) Empty reply from server >>> curl: (52) Empty reply from server >>> 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046 >>> >>> My stop script issues a call to http://localhost:9200/_ >>> cluster/nodes/_local/_shutdown to kill the node. Is it possible the >>> other nodes are waiting to timeout the down node before moving into the >>> yellow state? I would assume the shutdown API call would inform the other >>> nodes that it is going down. >>> >>> Appreciate any help on how to do this properly. >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >>> To view this discussion on the web visit https://groups.google.com/d/ >>> msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889% >>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBTfHruuxC4JpzNBQSNtGezXQxPvrYYTR0oMJ3YWLYfPQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.
