Re: Rolling restart of a cluster?

Nikolas Everett Wed, 02 Apr 2014 08:43:27 -0700

I just used this to upgrade our labs environment a couple of days ago:

#!/bin/bash


export prefix=deployment-elastic0
export suffix=.eqiad.wmflabs
rm -f servers
for i in {1..4}; do
    echo $prefix$i$suffix >> servers
done

cat << __commands__ > /tmp/commands
wget
https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.0.deb
sudo dpkg -i --force-confdef --force-confold elasticsearch-1.1.0.deb
curl -s -XPUT localhost:9200/_cluster/settings?pretty -d '{
    "transient" : {
        "cluster.routing.allocation.enable": "primaries"
    }
}'
sudo /etc/init.d/elasticsearch restart
until curl -s localhost:9200/_cluster/health?pretty; do
    sleep 1
done
curl -s -XPUT localhost:9200/_cluster/settings?pretty -d '{
    "transient" : {
        "cluster.routing.allocation.enable": "all"
    }
}'
until curl -s localhost:9200/_cluster/health?pretty | tee /tmp/health |
grep green; do
    cat /tmp/health
    sleep 1
done
__commands__

for server in $(cat servers); do
    scp /tmp/commands $server:/tmp/commands
    ssh $server bash /tmp/commands
done



Production will swap wget and dpkg with apt-get update and apt-get install
elasticsearch but you get the idea.

It isn't fool proof.  If it dies it doesn't know how to start where it left
off and you might have to kill it if the cluster doesn't come back like
you'd expect.  It really only covers the "everything worked out as
expected" scenario.  But it is nice when that happens.

Nik


On Wed, Apr 2, 2014 at 7:23 AM, Petter Abrahamsson <[email protected]> wrote:

> Mike,
>
> Your script needs to check for the status of the cluster before shutting
> down a node, ie if the state is yellow wait until it becomes green again
> before shutting down the next node. You'll probably want do disable
> allocation of shards while each node is being restarted (enable when node
> comes back) in order to minimize the amount of data that needs to be
> rebalanced.
> Also make sure to have 'discovery.zen.minimum_master_nodes' correctly set
> in your elasticsearch.yml file.
>
> Meta code
>
> for node in $cluster_nodes; do
>    if [ $cluster_status == 'green' ]; then
>     cluster_disable_allocation()
>     shutdown_node($node)
>     wait_for_node_to_rejoin()
>     cluster_enable_allocation()
>     wait_for_cluster_status_green()
>   fi
> done
>
>
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-cluster.html
>
> /petter
>
>
> On Tue, Apr 1, 2014 at 6:19 PM, Mike Deeks <[email protected]> wrote:
>
>> What is the proper way of performing a rolling restart of a cluster? I
>> currently have my stop script check for the cluster health to be green
>> before stopping itself. Unfortunately this doesn't appear to be working.
>>
>> My setup:
>> ES 1.0.0
>> 3 node cluster w/ 1 replica.
>>
>> When I perform the rolling restart I see the cluster still reporting a
>> green state when a node is down. In theory that should be a yellow state
>> since some shards will be unallocated. My script output during a rolling
>> restart:
>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388310 21:38:30 dev_cluster green 3 3 1202 601 2 0 0
>>
>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388312 21:38:32 dev_cluster green 3 3 1202 601 2 0 0
>>
>> curl: (52) Empty reply from server
>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388313 21:38:33 dev_cluster green 3 3 1202 601 2 0 0
>>
>> curl: (52) Empty reply from server
>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>> 1396388314 21:38:34 dev_cluster green 3 3 1202 601 2 0 0
>> ... continues as green for many more seconds...
>>
>> Since it is reporting as green, the second node thinks it can stop and
>> ends up putting the cluster into a broken red state:
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388339 21:38:59 dev_cluster green 2 2 1202 601 2 0 0
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388341 21:39:01 dev_cluster yellow 2 2 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388342 21:39:02 dev_cluster yellow 2 2 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388343 21:39:03 dev_cluster yellow 2 2 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388345 21:39:05 dev_cluster yellow 1 1 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388346 21:39:06 dev_cluster yellow 1 1 664 601 2 8 530
>>
>> curl: (52) Empty reply from server
>> curl: (52) Empty reply from server
>> 1396388347 21:39:07 dev_cluster red 1 1 156 156 0 0 1046
>>
>> My stop script issues a call to
>> http://localhost:9200/_cluster/nodes/_local/_shutdown to kill the node.
>> Is it possible the other nodes are waiting to timeout the down node before
>> moving into the yellow state? I would assume the shutdown API call would
>> inform the other nodes that it is going down.
>>
>> Appreciate any help on how to do this properly.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/baba0a96-a991-42e3-a827-43881240e889%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CALAhT_hertv4oX1Rcq71ELQUBdyq33ncktqT5%3DZn%3D0cOfkBxaA%40mail.gmail.com<https://groups.google.com/d/msgid/elasticsearch/CALAhT_hertv4oX1Rcq71ELQUBdyq33ncktqT5%3DZn%3D0cOfkBxaA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd1oV7cERqdnatMV-7CZuywu9jeZ-LdKBQ%3DrsOp_oGLizA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Rolling restart of a cluster?

Reply via email to