On May 04, 2008 21:45 -0300, Peter Bojanic wrote: > Dave, thanks for the great response --this could easily be elavorated > as a short LCE whitepaper, btw. > > I look forward to hearing from Andreas, Alex and other Lustre > engineers on this.
I haven't personally been using sgp_dd or xdd very much, but the requirement for kernels >= 2.6.23 pretty much rules this out for use at most of our customers, since the latest vendor kernel (RHEL5) is based on 2.6.18. As for the issue of mutli-threaded processes not having perfectly sequential IO, that is fine also, because the way we use sgp_dd already has similar issues, and this is also true of Lustre OSTs as well. > On 4-May-08, at 17:40, David Dillow <[EMAIL PROTECTED]> wrote: > > > > > On Sat, 2008-05-03 at 11:54 -0700, Peter Bojanic wrote: > >> I've seen a couple of references to ORNL using xdd versus sgp_dd for > >> low-level disk performance benchmarking. Could you please summarize > >> the differences and advise if our engineer team as well as Lustre > >> partners should be considering this alternative? > > > > We originally started using xdd for testing as it had features that > > made > > it easy to synchronize runs involving multiple hosts -- this is > > important for the testing we've doing against LSI's XBB-2 system and > > DDN's 9900. For example, the 9900 was able to hit ~1550 MB/s to 1600 > > MB/s against a single IB port, but each singlet topped out at ~2650 > > to > > 2700 MB/s or so when hit by two hosts. To get realistic aggregate > > numbers for both systems, requires that we hit them with four IO hosts > > or OSSes. > > > > When run in direct IO (-dio) mode against the SCSI disk device on > > recent > > kernels, xdd takes a very similar path to Lustre's use case -- > > building > > up bio's and using submit_bio() directly, without going through the > > page > > cache and triggering the read-ahead code and associated problems. In > > this mode, xdd gave us an aggregate bandwidth of ~5500 MB/s, which > > matched up nicely against the ~5000 MB/s we obtained with an IOR run > > against a Lustre filesystem on the same hardware. We saw the expected > > 10% hit from the filesystem vs raw disk. > > > > In contrast, sgp_dd gave us ~1100 MB/s from a single port, which would > > indicate a maximum 4400 MB/s from the array assuming perfect scaling. > > That would mean we got a result on the filesystem of 113.6% of raw > > performance, which doesn't sit well. > > > > That said, there are a few caveats to using xdd -- the largest being > > that it does not give perfectly sequential requests when run with a > > queue depth greater than 1. It uses multiple threads when it wants to > > have more than 1 request in flight, and that leads to the requests > > being > > generally ascending, but not perfectly sequential. This can cause > > performance regressions when the array does not internally reorder > > requests. > > > > It is only possible to run xdd in direct IO mode against block devices > > in recent kernels -- 2.6.23 I believe is the cutoff. In kernels older > > than that, it must go through the page cache, and that may cause lower > > performance to be measured. > > > > Aborted shutdowns of xdd will often leave SysV semaphores orphaned, > > which will require manual cleanup when you hit the system limit. > > > > It looks like it should be possible to run xdd in a manner suitable > > for > > sgpdd-survey so that we could run tests against multiple regions of > > the > > disk at the same time. I've not spent any time looking closely at that > > option. > > > > I'm not sure why sgd_dd was getting lower numbers on the 2.6.24 > > kernel I > > was testing against -- there may be a performance regression with the > > SCSI generic devices. > > > > Hope this helps, feel free to ask further questions. > > -- > > Dave Dillow > > National Center for Computational Science > > Oak Ridge National Laboratory > > (865) 241-6602 office > > > _______________________________________________ > Lustre-discuss mailing list > Lustre-discuss@lists.lustre.org > http://lists.lustre.org/mailman/listinfo/lustre-discuss Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss