We've thoroughly examined the back-end storage and the connections between the OSSs and back-end. There are no faults as of now. Previously our couplet had lost cache sync, but that's since been resolved and the load issue remains.
On Oct 18, 2010, at 10:43 AM, Paul Nowoczynski wrote: > I wonder if there's some type of fault in the I/O path which is increasing > the latency of individual I/Os? Something like this could affect the load > especially when considering the number of kernel threads on the OST. > paul > > John White wrote: >> Hello Folks, >> A while back (say 3 weeks ago) we started noticing extremely high loads >> (load avg around 300 at times) on our OSSs when in production and serving >> IO. This cluster was, at the time, on 1.8.2 (we have since upgraded to >> 1.8.4 but the problem remains). The load increases fairly predictably as >> clients generate IO but even 2 clients can produce a load avg above 5.00. >> An identical file system of ours does not exhibit this behavior (sticks >> below load avg 1.00 under even the heaviest IO load). I've looked around >> bugzilla and haven't found anything. We've disabled heartbeat on the >> off-chance that was generating the load (it's not), we've attempted using a >> different client transport (o2ib->tcp), this did not solve the issue. There >> doesn't appear to be any specific non-kernel thread causing the high-load. >> The only info in dmesg/syslog pertains to sporadic client evictions or >> sporadic slow setattr due to heavy IO load (we've since tuned the number of >> OST threads). We're basica lly >> out of ideas to try. >> >> As reference, this is a 1 MDS/4 OSS cluster backed by a DDN 9900 couplet (15 >> tiers, 1:1 lun mapping) running the lustre.org rpm build kernel for 1.8.4. >> The MDS/OSSs are Dell R710s and the MDT is a Dell MD1000. Is this a common >> problem or should a bug be filed? Any info available upon request. Thanks >> for your time. >> ---------------- >> John White >> High Performance Computing Services (HPCS) >> (510) 486-7307 >> One Cyclotron Rd, MS: 50B-3209C >> Lawrence Berkeley National Lab >> Berkeley, CA 94720 >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > ---------------- John White High Performance Computing Services (HPCS) (510) 486-7307 One Cyclotron Rd, MS: 50B-3209C Lawrence Berkeley National Lab Berkeley, CA 94720 _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
