Datanode handling of single disk failure

Brian Bockelman Wed, 17 Dec 2008 12:03:13 -0800

Hello all,

I'd like to take the datanode's capability to handle multipledirectories to a somewhat-extreme, and get feedback on how well thismight work.

We have a few large RAID servers (12 to 48 disks) which we'd like totransition to Hadoop. I'd like to mount each of the disksindividually (i.e., /mnt/disk1, /mnt/disk2, ....) and take advantageof Hadoop's replication - instead of pay the overhead to set up a RAIDand still have to pay the overhead of replication.

However, we're a bit concerned about how well Hadoop might handle oneof the directories disappearing from underneath it. If a singlevolume, say, /mnt/disk1 starts returning I/O errors, is Hadoop smartenough to figure out that this whole volume is broken? Or will wehave to restart the datanode after any disk failure for it to searchthe directory realize everything is broken? What happens if you startup the datanode with a data directory that it can't write into?

Is anyone running in this fashion (i.e., multiple data directoriescorresponding to different disk volumes ... even better if you'redoing it with more than a few disks)?


Brian

Datanode handling of single disk failure

Reply via email to