Thank you! I'm glad to hear that you have actually tested this. I believe that a failure of any disk - even with JBOD - will cause dataNode to bring the node down. Presumably, we could bring it right back up, but this does sort of diminish the availability argument for JBOD.
Sounds like it's basically a toss-up. I'm a bit concerned about the potential for uneven distribution - both of amount of data, and of transfer load - across the spindles. Unless I hear otherwise, I will probably go with RAID-0. On Mon, Jan 12, 2009 at 12:17 PM, Colin Evans <[email protected]> wrote: > Currently, Hadoop does round-robin allocation of blocks and data across > multiple JBOD disks. We did some testing and found that there weren't > significant differences between RAID-0 and JBOD. We went with JBOD because > we figured that RAID-0 has a higher failure rate than JBOD -- any disk > failure in a 3-disk RAID-0 configuration causes the whole node to go down, > but if there is a single disk failure in a JBOD configuration, Hadoop will > go on serving from the other disks.
