hey on lustre client i got this error .
LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 17 previous similar messages LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90 LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 1 previous similar message LustreError: 2208:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116 LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -2 req@c229bc00 x1219552/t0 o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2 LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 40 previous similar messages LustreError: 2188:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90 LustreError: 2188:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116 LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -2 req@c22a3a00 x1219666/t0 o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2 LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 88 previous similar messages LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90 LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 2 previous similar messages LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116 LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) Skipped 2 previous similar messages On Wed, Jan 26, 2011 at 11:53 PM, Brian J. Murrell <[email protected]>wrote: > On Wed, 2011-01-26 at 22:24 +0500, Nauman Yousuf wrote: > > > > Your logs don't have timestamps so it's difficult to correlate events > but did you notice right before you started getting these messages: > > > > Lustre: 1588:0:(lustre_fsfilt.h:283:fsfilt_setattr()) mds01: slow setattr > 31s > > Lustre: 1595:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow > journal start 33s > > Lustre: 1720:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow > journal start 32s > > Lustre: 1602:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow > journal start 38s > > You got this: > > > drbd0: Resync started as SyncSource (need to sync 634747844 KB [158686961 > bits set]). > > drbd0: Resync done (total 97313 sec; paused 0 sec; 6520 K/sec) > > drbd0: drbd0_worker [1126]: cstate SyncSource --> Connected > > I'm no DRBD expert by a long shot but that looks to me like you had a > disk in the MDS re-syncing to it's DRBD partner. If that disk is the > MDT, a resync, of course is going to slow down the MDT. > > The problem here is that you are probably tuned (i.e. the number of > threads) to expect to full performance out of the hardware and when it's > under a resync load, it won't deliver it. > > Unfortunately at this point Lustre will push it's thread count higher if > can determine it can get more performance out of a target but it won't > back off when things slow down (i.e. because the disk is being > commandeered for housekeeping tasks such as resync or raid rebuild, > etc.), so you need to maximize your thread count to what performs well > while your disks are under a resync load. > > Please see the operations manual for details on tuning thread counts for > performance. > > Cheers, > b. > > > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss > >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
