The problem is, what do you define as a failure?  If the disk is failing, 
writes will fail to the filesystem - how does Hadoop differentiate between 
permissions and physical disk failure?  They both return error.

And yeah, the idea of stopping the datanode, removing the affected mount from 
hdfs-site.xml, and restarting has been discussed.  The problem is, when that 
disk gets replaced, and readded, then I have horrible internal balance issues.  
Thus causing the problem I have now :(

-j

On Jan 3, 2011, at 9:07 AM, Eli Collins wrote:

> Hey Jonathan,
> 
> There's an option (dfs.datanode.failed.volumes.tolerated, introduced
> in HDFS-1161) that allows you to specify the number of volumes that
> are allowed to fail before a datanode stops offering service.
> 
> There's an operational issue that still needs to be addressed
> (HDFS-1158) that you should be aware of - the DN will still not start
> if any of the volumes have failed, so to restart the DN you'll need
> you'll need to either unconfigure the failed volumes or fix them. I'd
> like to make DN startup respect the config value so it tolerates
> failed volumes on startup as well.
> 
> Thanks,
> Eli
> 
> On Sun, Jan 2, 2011 at 7:20 PM, Jonathan Disher <jdis...@parad.net> wrote:
>> I see that there was a thread on this in December, but I can't retrieve it 
>> to reply properly, oh well.
>> 
>> So, I have a 30 node cluster (plus separate namenode, jobtracker, etc).  
>> Each is a 12 disk machine - two mirrored 250GB OS disks, ten 1TB data disks 
>> in JBOD.  Original system config was six 1TB data disks - we added the last 
>> four disks months later.  I'm sure you can all guess, we have some 
>> interesting internal usage balancing issues on most of the nodes.  To date, 
>> when individual disks get critically low on space (earlier this week I had a 
>> node with six disks around 97% full, four around 70%), we've been pulling 
>> them from the cluster, formatting the data disks, and sticking them back in 
>> (with a rebalance running to keep the cluster in some semblance of order).
>> 
>> Obviously if there was a better way to do this, I'd love to see it.  I see 
>> that there are recommendations of killing the DataNode process and manually 
>> moving files, but my concern is that the DataNode process will spend an 
>> enormous amount of time tracking down these moves (currently around 820,000 
>> blocks/node).  And it's not necessarily easy to automate, so there's the 
>> danger of nuking blocks, and making the problems worse.  Are there 
>> alternatives to manual moves (or more automated ways that exist)?  Or has my 
>> brute-force rebalance got the best chance of success, albeit slowly?
>> 
>> We are also building a new cluster - starting around 1.2PB raw, eventually 
>> growing to around 5PB, for near-line storage of data.  Our storage nodes 
>> will probably be 4U systems with 72 data disks each (yeah, good times).  The 
>> problem with this becomes obvious - with the way Hadoop works today, if a 
>> disk fails, the datanode process chokes and dies when it tries to write to 
>> it.  We've been told repeatedly that Hadoop doesn't perform well when it 
>> operates on RAID arrays, but, to scale efffectively, we're going to have to 
>> do just that - three 24 disk controllers in RAID-6 mode.  How bad is this 
>> going to be?  JBOD just doesn't scale beyond a couple disks per machine, the 
>> failure rate will knock machines out of the cluster too often (and at 60TB 
>> per node, rebalancing will take forever, even if I let it saturate gigabit).
>> 
>> I appreciate opinions and suggestions.  Thanks!
>> 
>> -j

Reply via email to