Three disks each mounted separately. What you say is true, it will handle failures better and generally perform better. You'll need to configure the dfs.datanode.failed.volumes.tolerated parameter in hdfs-site.xml to make sure that it handles a single failed volume gracefully.
-Joey On Mon, Jan 30, 2012 at 4:57 PM, Aaron Tokhy <aaron.to...@resonatenetworks.com> wrote: > Given a HDFS slave node setup of 3 disks per node, should I have 3 > filesystems (one file system per disk) in my dfs.data.dir listing, or should > I have a single filesystem on a JBOD setup of 3 disks? Googling this > problem suggests using "JBOD" instead of RAID 0, but I'm talking about two > different kinds of JBOD: one managed by OS (mdadm) or firmware with a single > filesystem, and the other managed by the DataNode (with multiple > filesystems). > > I already have a preference to providing multiple filesystems in the > dfs.data.dir listing since theoretically the DataNode should properly handle > where it would place its blocks (instead of abstracting this to the OS or > firmware). When a drive dies, I could also theoretically swap in a new > drive without worrying about crashing an entire JBOD array (technically I > only lose the blocks on the failing disk, not risking filesystem level > corruption). In some ways, I may already know the answer to my question, > I'm just looking for anyone's experience with this datacenter-wide decision, > or if they have a preference of one method over another. > > > I'm trying to go along the lines as what is being done in this post: > > http://old.nabble.com/forum/ViewPost.jtp?post=21423861&framed=y -- Joseph Echeverria Cloudera, Inc. 443.305.9434