On Sat, 2008-05-03 at 11:54 -0700, Peter Bojanic wrote: > I've seen a couple of references to ORNL using xdd versus sgp_dd for > low-level disk performance benchmarking. Could you please summarize > the differences and advise if our engineer team as well as Lustre > partners should be considering this alternative?
We originally started using xdd for testing as it had features that made it easy to synchronize runs involving multiple hosts -- this is important for the testing we've doing against LSI's XBB-2 system and DDN's 9900. For example, the 9900 was able to hit ~1550 MB/s to 1600 MB/s against a single IB port, but each singlet topped out at ~2650 to 2700 MB/s or so when hit by two hosts. To get realistic aggregate numbers for both systems, requires that we hit them with four IO hosts or OSSes. When run in direct IO (-dio) mode against the SCSI disk device on recent kernels, xdd takes a very similar path to Lustre's use case -- building up bio's and using submit_bio() directly, without going through the page cache and triggering the read-ahead code and associated problems. In this mode, xdd gave us an aggregate bandwidth of ~5500 MB/s, which matched up nicely against the ~5000 MB/s we obtained with an IOR run against a Lustre filesystem on the same hardware. We saw the expected 10% hit from the filesystem vs raw disk. In contrast, sgp_dd gave us ~1100 MB/s from a single port, which would indicate a maximum 4400 MB/s from the array assuming perfect scaling. That would mean we got a result on the filesystem of 113.6% of raw performance, which doesn't sit well. That said, there are a few caveats to using xdd -- the largest being that it does not give perfectly sequential requests when run with a queue depth greater than 1. It uses multiple threads when it wants to have more than 1 request in flight, and that leads to the requests being generally ascending, but not perfectly sequential. This can cause performance regressions when the array does not internally reorder requests. It is only possible to run xdd in direct IO mode against block devices in recent kernels -- 2.6.23 I believe is the cutoff. In kernels older than that, it must go through the page cache, and that may cause lower performance to be measured. Aborted shutdowns of xdd will often leave SysV semaphores orphaned, which will require manual cleanup when you hit the system limit. It looks like it should be possible to run xdd in a manner suitable for sgpdd-survey so that we could run tests against multiple regions of the disk at the same time. I've not spent any time looking closely at that option. I'm not sure why sgd_dd was getting lower numbers on the 2.6.24 kernel I was testing against -- there may be a performance regression with the SCSI generic devices. Hope this helps, feel free to ask further questions. -- Dave Dillow National Center for Computational Science Oak Ridge National Laboratory (865) 241-6602 office _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
