Re: Replace failing disks on a single node

Mark Walkom Sat, 27 Sep 2014 15:10:14 -0700

Disk issues are not really something ES should have to worry about, you
should either be running redundancy on the physical layer or accepting that
if you don't situations like this will occur.


If you remove the node and the cluster is yellow then just replace the
disk. Yellow indicates replica shards unallocated which means your primary
shards are still OK. You can confirm this using the _cat API or a visual
tool like kopf.
Then when you add the node back it will rebalance and you should again
reach green status.

The node snapshot does sound interesting though and might be useful, if
you're wanting this functionality then it'd be worth creating a github
issue with the request.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: [email protected]
web: www.campaignmonitor.com

On 28 September 2014 01:12, vic hargrave <[email protected]> wrote:

> The cluster goes to yellow fairly quickly but never reaches a green
> state.  If I knew that new replicas would be generated from the primaries
> when I add fresh disks, I would just go ahead and replace the failing disks
> at that point.
>
> When I say "failing" disks, I mean the indicator lights on the disks in
> the system chassis indicate that they are exhibiting errors.  I can see
> that this affects the ingestion rate of the cluster so I want to replace
> them before they fail completely.  I have had this happen before with
> another system.  When disks start to go bad Elasticsearch has trouble
> getting cluster status of the node with the failing disk and slows down to
> a crawl.  It is best to try to replace disks before they fail completely
> when Elasticsearch is involved.
>
> Anyhow, I think the Elasticsearch dev folks should think about this
> failure scenario.  It would be great if they added the capability to
> snapshot a single node after disabling shard reallocation -
> http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/setup-upgrade.html.
> As it stands now, replacing a failing or failed disk in a node is a
> troublesome prospect.
>
> On Friday, September 26, 2014 8:16:50 PM UTC-7, David Pilato wrote:
>>
>> Is your cluster still yellow?
>> It should be Green at some point unless you change some settings
>> explicitly.
>>
>> If your cluster does not index anymore, you could copy manually files in
>> data dir and copy them on your new disk. But I wonder how you can copy from
>> a failing disk?
>>
>>  I'd probably let elasticsearch do it over the wire.
>>
>> --
>> David ;-)
>> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>>
>>
>> Le 27 sept. 2014 à 02:30, vic hargrave <[email protected]> a écrit :
>>
>> I have a situation where I need to replace disks that are failing on a
>> single node in my 4 node Elasticsearch cluster.  As a result I'd like to
>> backup the Elasticsearch data on that node only, replace the disks and then
>> restore the data to the new (empty) disks.  I've tried shutting down the
>> node in question, but the remaining 3 nodes can only get to a "yellow"
>> state.  I'm using 5 primary shards and 1 replica shard per index.  I
>> considered using snapshot for the single node, but it seems Elasticsearch
>> does not support snapshot and restore for a single node, it must be done on
>> the whole cluster.
>>
>> Is it possible to just manually copy the data from the failing disk to
>> another disk, replace the failing disk then copy the data back to the new
>> disk (starting and stopping Elasticsearch before and after this whole
>> process, of course)?
>>
>> -- vic
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To view this discussion on the web visit https://groups.google.com/d/
>> msgid/elasticsearch/42e5d7c4-a2ee-45da-bfe5-d0327011f52d%
>> 40googlegroups.com
>> <https://groups.google.com/d/msgid/elasticsearch/42e5d7c4-a2ee-45da-bfe5-d0327011f52d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/dbe5d1a2-d377-4982-a2e5-e55024f2c4b4%40googlegroups.com
> <https://groups.google.com/d/msgid/elasticsearch/dbe5d1a2-d377-4982-a2e5-e55024f2c4b4%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bVPA%3D-iqH8%2B3Xjv%2BEKQy6AJkABo08LJk8U0GasWfxT3A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: Replace failing disks on a single node

Reply via email to