On Wed, 2011-01-26 at 22:24 +0500, Nauman Yousuf wrote: > Your logs don't have timestamps so it's difficult to correlate events but did you notice right before you started getting these messages:
> Lustre: 1588:0:(lustre_fsfilt.h:283:fsfilt_setattr()) mds01: slow setattr 31s > Lustre: 1595:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal > start 33s > Lustre: 1720:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal > start 32s > Lustre: 1602:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow journal > start 38s You got this: > drbd0: Resync started as SyncSource (need to sync 634747844 KB [158686961 > bits set]). > drbd0: Resync done (total 97313 sec; paused 0 sec; 6520 K/sec) > drbd0: drbd0_worker [1126]: cstate SyncSource --> Connected I'm no DRBD expert by a long shot but that looks to me like you had a disk in the MDS re-syncing to it's DRBD partner. If that disk is the MDT, a resync, of course is going to slow down the MDT. The problem here is that you are probably tuned (i.e. the number of threads) to expect to full performance out of the hardware and when it's under a resync load, it won't deliver it. Unfortunately at this point Lustre will push it's thread count higher if can determine it can get more performance out of a target but it won't back off when things slow down (i.e. because the disk is being commandeered for housekeeping tasks such as resync or raid rebuild, etc.), so you need to maximize your thread count to what performs well while your disks are under a resync load. Please see the operations manual for details on tuning thread counts for performance. Cheers, b.
signature.asc
Description: This is a digitally signed message part
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
