Wondering if I can use any tool like sysbench to get a _approximate idea_ of performance of various disk setups (RAID-0, RAID-1, ext4, xfs ...etc) that would be used by HDFS. (I do understand the the real performance of HDFS/HBase depends on the final overall system and workload. )
for example take sysbench: $ sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw prepare $ sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw run $ sysbench --num-threads=16 --test=fileio --file-total-size=3G --file-test-mode=rndrw cleanup here is what the above means: --- Number of threads: 16 Extra file open flags: 0 128 files, 24Mb each 3Gb total file size Block size 16Kb Number of random requests for random IO: 10000 Read/Write ratio for combined random IO test: 1.50 Periodic FSYNC enabled, calling fsync() each 100 requests. Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random r/w test -- One thing jumps out at me is the block_size, it needs to be upped to 64M. Any other params I should tweak? Would like to hear from the community what they have used to bench disk i/o. thanks Sujee http://sujee.net