Re: How to remove three disks from three different nodes in a ten node cluster in less than an hour without losing replicas?

Colin McCabe Mon, 04 Feb 2013 14:54:26 -0800

It sounds like what you would like is a way to decommission just one
storage directory on the DataNode. We don't currently support that.

You might be able to get something approaching this result with
"chmod 000 $storage_directory_root".  That would at least prevent new
blocks from being created on the disk which you don't trust any more.  It
would also cause the existing blocks to be re-replicated when the
DirectoryScanner re-ran and noticed it couldn't get to them.  Note that I
haven't actually tested the chmod solution, though, so your milage may vary.

best,
Colin

On Wed, Jan 30, 2013 at 10:34 PM, Stack <st...@duboce.net> wrote:

> Here is a little puzzle.
>
> An admin works for a cash-strapped, popular web shop.  At the datacenter
> she has a ten node cluster that is heavily used.  It runs hot all day long
> and decommissioning a node with its background replicating of 12 disks
> worth of data messes up the work load she has on top of it and makes her
> clients very unhappy.  Replicating the data of one node takes at least an
> hour.  This cluster has three bad disks in three different nodes
> (replication factor is 3).  The admin lives an hour from the datacenter.
>  She can't afford a cage monkey and so must replace the disks herself.
>
> If she left home at 2pm and had to be back by 6pm before the kids came
> home from school, how would she replace the three disks without for sure
> losing a replica?
>
> Is the only answer remove one, wait on clean fsck run, remove the next one?
>
> Thanks,
> St.Ack
>
>
>
>

Re: How to remove three disks from three different nodes in a ten node cluster in less than an hour without losing replicas?

Reply via email to