Given a HDFS slave node setup of 3 disks per node, should I have 3 filesystems (one file system per disk) in my dfs.data.dir listing, or should I have a single filesystem on a JBOD setup of 3 disks? Googling this problem suggests using "JBOD" instead of RAID 0, but I'm talking about two different kinds of JBOD: one managed by OS (mdadm) or firmware with a single filesystem, and the other managed by the DataNode (with multiple filesystems).

I already have a preference to providing multiple filesystems in the dfs.data.dir listing since theoretically the DataNode should properly handle where it would place its blocks (instead of abstracting this to the OS or firmware). When a drive dies, I could also theoretically swap in a new drive without worrying about crashing an entire JBOD array (technically I only lose the blocks on the failing disk, not risking filesystem level corruption). In some ways, I may already know the answer to my question, I'm just looking for anyone's experience with this datacenter-wide decision, or if they have a preference of one method over another.


I'm trying to go along the lines as what is being done in this post:

http://old.nabble.com/forum/ViewPost.jtp?post=21423861&framed=y

Reply via email to