Currently, Hadoop does round-robin allocation of blocks and data across multiple JBOD disks. We did some testing and found that there weren't significant differences between RAID-0 and JBOD. We went with JBOD because we figured that RAID-0 has a higher failure rate than JBOD -- any disk failure in a 3-disk RAID-0 configuration causes the whole node to go down, but if there is a single disk failure in a JBOD configuration, Hadoop will go on serving from the other disks.


On Jan 11, 2009, at 1:23 PM, David B. Ritch wrote:

How well does Hadoop handle multiple independent disks per node?

I have a cluster with 4 identical disks per node.  I plan to use one
disk for OS and temporary storage, and dedicate the other three to
HDFS. Our IT folks have some disagreement as to whether the three disks should be striped, or treated by HDFS as three independent disks. Could someone with more HDFS experience comment on the relative advantages and
disadvantages to each approach?

Here are some of my thoughts.  It's a bit easier to manage a 3-disk
striped partition, and we wouldn't have to worry about balancing files
between them.  Single-file I/O should be considerably faster.  On the
other hand, I would expect typical use to require multiple files reads
or write simultaneously.  I would expect Hadoop to be able to manage
read/write to/from the disks independently.  Managing 3 streams to 3
independent devices would likely result in less disk head movement, and
therefore better performance.  I would expect Hadoop to be able to
balance load between the disks fairly well. Availability doesn't really
differentiate between the two approaches - if a single disk dies, the
striped array would go down, but all its data should be replicated on
another datanode, anyway. And besides, I understand that datanode will
shut down a node, even if only one of 3 independent disks crashes.

So - any one want to agree or disagree with these thoughts? Anyone have
any other ideas, or - better - benchmarks and experience with layouts
like these two?

Thanks!

David

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to