Re: Rolling restart of a cluster?

Ivan Brusic Wed, 02 Apr 2014 11:17:40 -0700

My scripts do a wait for yellow before waiting for green, because as you
noticed, the cluster does not entering a yellow state immediately following
a cluster (shutdown, replica change) event.


-- 
Ivan


On Wed, Apr 2, 2014 at 11:08 AM, Mike Deeks <[email protected]> wrote:

> That is exactly what I'm doing. For some reason the cluster reports as
> green even though an entire node is down. The cluster doesn't seem to
> notice the node is gone and change to yellow until many seconds later. By
> then my rolling restart script has already gotten to the second node and
> killed it because the cluster was still green for some reason.
>
>
> On Wednesday, April 2, 2014 4:23:32 AM UTC-7, Petter Abrahamsson wrote:
>
>> Mike,
>>
>> Your script needs to check for the status of the cluster before shutting
>> down a node, ie if the state is yellow wait until it becomes green again
>> before shutting down the next node. You'll probably want do disable
>> allocation of shards while each node is being restarted (enable when node
>> comes back) in order to minimize the amount of data that needs to be
>> rebalanced.
>> Also make sure to have 'discovery.zen.minimum_master_nodes' correctly
>> set in your elasticsearch.yml file.
>>
>> Meta code
>>
>> for node in $cluster_nodes; do
>>   if [ $cluster_status == 'green' ]; then
>>     cluster_disable_allocation()
>>     shutdown_node($node)
>>     wait_for_node_to_rejoin()
>>     cluster_enable_allocation()
>>     wait_for_cluster_status_green()
>>   fi
>> done
>>
>> http://www.elasticsearch.org/guide/en/elasticsearch/
>> reference/current/modules-cluster.html
>>
>> /petter
>>
>>
>> On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks <[email protected]> wrote:
>>
>>> What is the proper way of performing a rolling restart of a cluster? I
>>> currently have my stop script check for the cluster health to be green
>>> before stopping itself. Unfortunately this doesn't appear to be working.
>>>
>>> My setup:
>>> ES 1.0.0
>>> 3 node cluster w/ 1 replica.
>>>
>>> When I perform the rolling restart I see the cluster still reporting a
>>> green state when a node is down. In theory that should be a yellow state
>>> since some shards will be unallocated. My script output during a rolling
>>> restart:
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>>> ... continues as green for many more seconds...
>>>
>>> Since it is reporting as green, the second node thinks it can stop and
>>> ends up putting the cluster into a broken red state:
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530
>>>
>>> curl: (52) Empty reply from server
>>> curl: (52) Empty reply from server
>>> 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046
>>>
>>> My stop script issues a call to http://localhost:9200/_
>>> cluster/nodes/_local/_shutdown to kill the node. Is it possible the
>>> other nodes are waiting to timeout the down node before moving into the
>>> yellow state? I would assume the shutdown API call would inform the other
>>> nodes that it is going down.
>>>
>>> Appreciate any help on how to do this properly.
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%
>>> 40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/07944665-ce89-4b12-94c2-69e815a4c15f%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQBTfHruuxC4JpzNBQSNtGezXQxPvrYYTR0oMJ3YWLYfPQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Rolling restart of a cluster?

Reply via email to