On Monday 18 October 2010, John White wrote: > Hello Folks, > A while back (say 3 weeks ago) we started noticing extremely high loads > (load avg around 300 at times) on our OSSs when in production and serving > IO. This cluster was, at the time, on 1.8.2 (we have since upgraded to > 1.8.4 but the problem remains). The load increases fairly predictably as > clients generate IO but even 2 clients can produce a load avg above 5.00.
Does this impact performance or does it only show up as an unexpectedly high number on the OSSes? /Peter > An identical file system of ours does not exhibit this behavior (sticks > below load avg 1.00 under even the heaviest IO load). I've looked around > bugzilla and haven't found anything. We've disabled heartbeat on the > off-chance that was generating the load (it's not), we've attempted using a > different client transport (o2ib->tcp), this did not solve the issue. > There doesn't appear to be any specific non-kernel thread causing the > high-load. The only info in dmesg/syslog pertains to sporadic client > evictions or sporadic slow setattr due to heavy IO load (we've since tuned > the number of OST threads). We're basically out of ideas to try. > > As reference, this is a 1 MDS/4 OSS cluster backed by a DDN 9900 couplet > (15 tiers, 1:1 lun mapping) running the lustre.org rpm build kernel for > 1.8.4. The MDS/OSSs are Dell R710s and the MDT is a Dell MD1000. Is this > a common problem or should a bug be filed? Any info available upon > request. Thanks for your time. ---------------- > John White > High Performance Computing Services (HPCS) > (510) 486-7307 > One Cyclotron Rd, MS: 50B-3209C > Lawrence Berkeley National Lab > Berkeley, CA 94720
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
