It sounds like what you would like is a way to decommission just one storage directory on the DataNode. We don't currently support that.
You might be able to get something approaching this result with "chmod 000 $storage_directory_root". That would at least prevent new blocks from being created on the disk which you don't trust any more. It would also cause the existing blocks to be re-replicated when the DirectoryScanner re-ran and noticed it couldn't get to them. Note that I haven't actually tested the chmod solution, though, so your milage may vary. best, Colin On Wed, Jan 30, 2013 at 10:34 PM, Stack <st...@duboce.net> wrote: > Here is a little puzzle. > > An admin works for a cash-strapped, popular web shop. At the datacenter > she has a ten node cluster that is heavily used. It runs hot all day long > and decommissioning a node with its background replicating of 12 disks > worth of data messes up the work load she has on top of it and makes her > clients very unhappy. Replicating the data of one node takes at least an > hour. This cluster has three bad disks in three different nodes > (replication factor is 3). The admin lives an hour from the datacenter. > She can't afford a cage monkey and so must replace the disks herself. > > If she left home at 2pm and had to be back by 6pm before the kids came > home from school, how would she replace the three disks without for sure > losing a replica? > > Is the only answer remove one, wait on clean fsck run, remove the next one? > > Thanks, > St.Ack > > > >