Far far from it. All OSTs are at most 23% full. There appear to be no lagging disks.
On Oct 18, 2010, at 11:55 AM, Wojciech Turek wrote: > Is this filesystem nearly full? Fragmentation can decrease back end > performance. > > Also check the disks stats on the DDN, maybe you have a slow disk in one of > your tiers. > > Wojciech > > On 18 October 2010 18:49, Peter Kjellstrom <[email protected]> wrote: > On Monday 18 October 2010, John White wrote: > > Hello Folks, > > A while back (say 3 weeks ago) we started noticing extremely high > > loads > > (load avg around 300 at times) on our OSSs when in production and serving > > IO. This cluster was, at the time, on 1.8.2 (we have since upgraded to > > 1.8.4 but the problem remains). The load increases fairly predictably as > > clients generate IO but even 2 clients can produce a load avg above 5.00. > > Does this impact performance or does it only show up as an unexpectedly high > number on the OSSes? > > /Peter > > > An identical file system of ours does not exhibit this behavior (sticks > > below load avg 1.00 under even the heaviest IO load). I've looked around > > bugzilla and haven't found anything. We've disabled heartbeat on the > > off-chance that was generating the load (it's not), we've attempted using a > > different client transport (o2ib->tcp), this did not solve the issue. > > There doesn't appear to be any specific non-kernel thread causing the > > high-load. The only info in dmesg/syslog pertains to sporadic client > > evictions or sporadic slow setattr due to heavy IO load (we've since tuned > > the number of OST threads). We're basically out of ideas to try. > > > > As reference, this is a 1 MDS/4 OSS cluster backed by a DDN 9900 couplet > > (15 tiers, 1:1 lun mapping) running the lustre.org rpm build kernel for > > 1.8.4. The MDS/OSSs are Dell R710s and the MDT is a Dell MD1000. Is this > > a common problem or should a bug be filed? Any info available upon > > request. Thanks for your time. ---------------- > > John White > > High Performance Computing Services (HPCS) > > (510) 486-7307 > > One Cyclotron Rd, MS: 50B-3209C > > Lawrence Berkeley National Lab > > Berkeley, CA 94720 > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > -- > Wojciech Turek > > Senior System Architect > > High Performance Computing Service > University of Cambridge > Email: [email protected] > Tel: (+)44 1223 763517 > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss ---------------- John White High Performance Computing Services (HPCS) (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720 _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
