We're still tweaking our own benchmarks, so we don't have any conclusive results yet. Has anyone done this kind of comparison before?
We're building a cluster of 40 machines with 5 drives each, and I'm
curious what people's experiences have been for using RAID-0 for HDFS
vs. configuring seperate partitions (JBOD) and having the datanode
balance between them.
I took a look at the datanode code, and datanodes appear to write blocks
using a round-robin algorithm when managing multiple partitions. In
theory, the striping on RAID-0 should be more evenly balanced than this,
but RAID-0 doesn't seem to give a speedup proportionate to the number of
drives being striped. Furthermore, our initial tests seem to suggest
that the JBOD configuration spends less time in wait state than the
RAID-0 configuration when running disk-bound jobs.
- hdfs > 100T? Todd Troxell
- Re: hdfs > 100T? Mads Toftum
- Re: hdfs > 100T? Todd Troxell
- Re: hdfs > 100T? Allen Wittenauer
- Re: hdfs > 100T? Ted Dunning
- RAID-0 vs. JBOD? Colin Evans
- Re: RAID-0 vs. JBOD? Ted Dunning
- Re: RAID-0 vs. JBOD? Raghu Angadi
- Re: hdfs > 100T? Todd Troxell
- Re: hdfs > 100T? Ted Dunning