Hi. > > Depends on what kind of I/O you do - are you going to be using MapReduce > and co-locating jobs and data? If so, it's possible to get close to those > speeds if you are I/O bound in your job and read right through each chunk. > If you have multiple disks mounted individually, you'll need the number of > streams equal to the number of disks. If you're going to do I/O that's not > through MapReduce, you'll probably be bound by the network interface. >
Btw, this what I wanted to ask as well: Is it more efficient to unify the disks into one volume (RAID or LVM), and then present them as a single space? Or it's better to specify each disk separately? Reliability-wise, the latter sounds more correct, as a single/several (up to 3) disks going down won't take the whole node with them. But perhaps there is a performance penalty?
