Re: Replace failing disks on a single node

vic hargrave Sat, 27 Sep 2014 08:12:36 -0700

The cluster goes to yellow fairly quickly but never reaches a green state. 
 If I knew that new replicas would be generated from the primaries when I 
add fresh disks, I would just go ahead and replace the failing disks at 
that point.


When I say "failing" disks, I mean the indicator lights on the disks in the 
system chassis indicate that they are exhibiting errors.  I can see that 
this affects the ingestion rate of the cluster so I want to replace them 
before they fail completely.  I have had this happen before with another 
system.  When disks start to go bad Elasticsearch has trouble getting 
cluster status of the node with the failing disk and slows down to a crawl. 
 It is best to try to replace disks before they fail completely when 
Elasticsearch is involved.

Anyhow, I think the Elasticsearch dev folks should think about this failure 
scenario.  It would be great if they added the capability to snapshot a 
single node after disabling shard reallocation 
- 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/1.x/setup-upgrade.html.
 
 As it stands now, replacing a failing or failed disk in a node is a 
troublesome prospect.

On Friday, September 26, 2014 8:16:50 PM UTC-7, David Pilato wrote:
>
> Is your cluster still yellow?
> It should be Green at some point unless you change some settings 
> explicitly.
>
> If your cluster does not index anymore, you could copy manually files in 
> data dir and copy them on your new disk. But I wonder how you can copy from 
> a failing disk? 
>
>  I'd probably let elasticsearch do it over the wire.
>
> --
> David ;-)
> Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
>
>
> Le 27 sept. 2014 à 02:30, vic hargrave <[email protected] <javascript:>> 
> a écrit :
>
> I have a situation where I need to replace disks that are failing on a 
> single node in my 4 node Elasticsearch cluster.  As a result I'd like to 
> backup the Elasticsearch data on that node only, replace the disks and then 
> restore the data to the new (empty) disks.  I've tried shutting down the 
> node in question, but the remaining 3 nodes can only get to a "yellow" 
> state.  I'm using 5 primary shards and 1 replica shard per index.  I 
> considered using snapshot for the single node, but it seems Elasticsearch 
> does not support snapshot and restore for a single node, it must be done on 
> the whole cluster.  
>
> Is it possible to just manually copy the data from the failing disk to 
> another disk, replace the failing disk then copy the data back to the new 
> disk (starting and stopping Elasticsearch before and after this whole 
> process, of course)? 
>
> -- vic
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/elasticsearch/42e5d7c4-a2ee-45da-bfe5-d0327011f52d%40googlegroups.com
>  
> <https://groups.google.com/d/msgid/elasticsearch/42e5d7c4-a2ee-45da-bfe5-d0327011f52d%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dbe5d1a2-d377-4982-a2e5-e55024f2c4b4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Replace failing disks on a single node

Reply via email to