Dave, thanks for the great response --this could easily be elavorated as a short LCE whitepaper, btw.
I look forward to hearing from Andreas, Alex and other Lustre engineers on this. Bojanic On 4-May-08, at 17:40, David Dillow <[EMAIL PROTECTED]> wrote: > > On Sat, 2008-05-03 at 11:54 -0700, Peter Bojanic wrote: >> I've seen a couple of references to ORNL using xdd versus sgp_dd for >> low-level disk performance benchmarking. Could you please summarize >> the differences and advise if our engineer team as well as Lustre >> partners should be considering this alternative? > > We originally started using xdd for testing as it had features that > made > it easy to synchronize runs involving multiple hosts -- this is > important for the testing we've doing against LSI's XBB-2 system and > DDN's 9900. For example, the 9900 was able to hit ~1550 MB/s to 1600 > MB/s against a single IB port, but each singlet topped out at ~2650 > to > 2700 MB/s or so when hit by two hosts. To get realistic aggregate > numbers for both systems, requires that we hit them with four IO hosts > or OSSes. > > When run in direct IO (-dio) mode against the SCSI disk device on > recent > kernels, xdd takes a very similar path to Lustre's use case -- > building > up bio's and using submit_bio() directly, without going through the > page > cache and triggering the read-ahead code and associated problems. In > this mode, xdd gave us an aggregate bandwidth of ~5500 MB/s, which > matched up nicely against the ~5000 MB/s we obtained with an IOR run > against a Lustre filesystem on the same hardware. We saw the expected > 10% hit from the filesystem vs raw disk. > > In contrast, sgp_dd gave us ~1100 MB/s from a single port, which would > indicate a maximum 4400 MB/s from the array assuming perfect scaling. > That would mean we got a result on the filesystem of 113.6% of raw > performance, which doesn't sit well. > > That said, there are a few caveats to using xdd -- the largest being > that it does not give perfectly sequential requests when run with a > queue depth greater than 1. It uses multiple threads when it wants to > have more than 1 request in flight, and that leads to the requests > being > generally ascending, but not perfectly sequential. This can cause > performance regressions when the array does not internally reorder > requests. > > It is only possible to run xdd in direct IO mode against block devices > in recent kernels -- 2.6.23 I believe is the cutoff. In kernels older > than that, it must go through the page cache, and that may cause lower > performance to be measured. > > Aborted shutdowns of xdd will often leave SysV semaphores orphaned, > which will require manual cleanup when you hit the system limit. > > It looks like it should be possible to run xdd in a manner suitable > for > sgpdd-survey so that we could run tests against multiple regions of > the > disk at the same time. I've not spent any time looking closely at that > option. > > I'm not sure why sgd_dd was getting lower numbers on the 2.6.24 > kernel I > was testing against -- there may be a performance regression with the > SCSI generic devices. > > Hope this helps, feel free to ask further questions. > -- > Dave Dillow > National Center for Computational Science > Oak Ridge National Laboratory > (865) 241-6602 office > _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss