I just wanted to add to this one other published benchmark
http://developer.yahoo.net/blogs/hadoop/2008/09/scaling_hadoop_to_4000_nodes_a.html
In this example on a very busy cluster of 4000 nodes both read and write 
throughputs
were close to the local disk bandwidth.
This benchmark (called TestDFSIO) uses large consequent write and reads.
You can run it yourself on your hardware to compare.

Is it more efficient to unify the disks into one volume (RAID or LVM), and
then present them as a single space? Or it's better to specify each disk
separately?

There was a discussion recently on this list about RAID0 vs separate disks.
Please search the archives. Separate disks turn out to perform better.

Reliability-wise, the latter sounds more correct, as a single/several (up to
3) disks going down won't take the whole node with them. But perhaps there
is a performance penalty?

You always have block replicas on other nodes, so one node going down should 
not be a problem.

Thanks,
--Konstantin

Reply via email to