guys still issues, some how my client and OSS start getting CUP load when this happens
oSS says LustreError: 1538:0:(ldlm_lockd.c:1425:ldlm_cancel_handler()) operation 103 from 12345-10.65.200.37@tcp with bad export cookie 14320354116280279937 LustreError: 1560:0:(ldlm_lockd.c:1425:ldlm_cancel_handler()) operation 103 from 12345-10.65.200.37@tcp with bad export cookie 14320354116280279937 LustreError: 1714:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017031 LustreError: 1708:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017031 LustreError: 1717:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017040 LustreError: 1717:0:(filter_io.c:532:filter_preprw_write()) Skipped 10 previous similar messages LustreError: 1700:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017174 LustreError: 1700:0:(filter_io.c:532:filter_preprw_write()) Skipped 5 previous similar messages LustreError: 1688:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28016970 LustreError: 1688:0:(filter_io.c:532:filter_preprw_write()) Skipped 12 previous similar messages LustreError: 1697:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017244 LustreError: 1697:0:(filter_io.c:532:filter_preprw_write()) Skipped 17 previous similar messages LustreError: 1709:0:(filter_io.c:532:filter_preprw_write()) ost2: trying to BRW to non-existent file 28017244 LustreError: 1709:0:(filter_io.c:532:filter_preprw_write()) Skipped 48 previous similar messages drbd1: [ll_ost_io_23/1690] sock_sendmsg time expired, ko = 4294967295 Lustre: 1689:0:(filter_io_26.c:714:filter_commitrw_write()) ost2: slow direct_io 30s Lustre: 1689:0:(filter_io_26.c:727:filter_commitrw_write()) ost2: slow commitrw commit 30s 10.65.200.37 is my lustre client LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 4 previous similar messages LustreError: 2199:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116 LustreError: 2199:0:(file.c:754:ll_extent_lock_callback()) Skipped 3 previous similar messages LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -2 req@c229d200 x1219484/t0 o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2 LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 17 previous similar messages LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90 LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 1 previous similar message LustreError: 2208:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116 LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -2 req@c229bc00 x1219552/t0 o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2 LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 40 previous similar messages LustreError: 2188:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90 LustreError: 2188:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116 LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == PTL_RPC_MSG_ERR, err == -2 req@c22a3a00 x1219666/t0 o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2 LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 88 previous similar messages LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90 LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 2 previous similar messages LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel failed: 116 LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) Skipped 2 previous similar messages 10.65.200.30 is my OSS both are generating load.. On Thu, Jan 27, 2011 at 3:17 PM, Nauman Yousuf <[email protected]>wrote: > hey on lustre client i got this error . > > > LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 17 > previous similar messages > LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server > (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90 > LustreError: 2208:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 1 > previous similar message > LustreError: 2208:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel > failed: 116 > LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == > PTL_RPC_MSG_ERR, err == -2 req@c229bc00 x1219552/t0 > o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2 > LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 40 > previous similar messages > LustreError: 2188:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server > (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90 > LustreError: 2188:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel > failed: 116 > LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) @@@ type == > PTL_RPC_MSG_ERR, err == -2 req@c22a3a00 x1219666/t0 > o4->ost2_UUID@cyclops_UUID:28 lens 328/288 ref 2 fl Rpc:R/0/0 rc 0/-2 > LustreError: 2169:0:(client.c:576:ptlrpc_check_status()) Skipped 88 > previous similar messages > LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) client/server > (nid 10.65.200.30@tcp) out of sync -- not fatal, flags 332c90 > LustreError: 2231:0:(ldlm_request.c:746:ldlm_cli_cancel()) Skipped 2 > previous similar messages > LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) ldlm_cli_cancel > failed: 116 > LustreError: 2231:0:(file.c:754:ll_extent_lock_callback()) Skipped 2 > previous similar messages > > > On Wed, Jan 26, 2011 at 11:53 PM, Brian J. Murrell <[email protected]>wrote: > >> On Wed, 2011-01-26 at 22:24 +0500, Nauman Yousuf wrote: >> > >> >> Your logs don't have timestamps so it's difficult to correlate events >> but did you notice right before you started getting these messages: >> >> >> > Lustre: 1588:0:(lustre_fsfilt.h:283:fsfilt_setattr()) mds01: slow >> setattr 31s >> > Lustre: 1595:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow >> journal start 33s >> > Lustre: 1720:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow >> journal start 32s >> > Lustre: 1602:0:(lustre_fsfilt.h:182:fsfilt_start_log()) mds01: slow >> journal start 38s >> >> You got this: >> >> > drbd0: Resync started as SyncSource (need to sync 634747844 KB >> [158686961 bits set]). >> > drbd0: Resync done (total 97313 sec; paused 0 sec; 6520 K/sec) >> > drbd0: drbd0_worker [1126]: cstate SyncSource --> Connected >> >> I'm no DRBD expert by a long shot but that looks to me like you had a >> disk in the MDS re-syncing to it's DRBD partner. If that disk is the >> MDT, a resync, of course is going to slow down the MDT. >> >> The problem here is that you are probably tuned (i.e. the number of >> threads) to expect to full performance out of the hardware and when it's >> under a resync load, it won't deliver it. >> >> Unfortunately at this point Lustre will push it's thread count higher if >> can determine it can get more performance out of a target but it won't >> back off when things slow down (i.e. because the disk is being >> commandeered for housekeeping tasks such as resync or raid rebuild, >> etc.), so you need to maximize your thread count to what performs well >> while your disks are under a resync load. >> >> Please see the operations manual for details on tuning thread counts for >> performance. >> >> Cheers, >> b. >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > > > > -- Regards Nauman Yousuf 0321-2549206 E-Eager, N-Noble, G-Genuine, I-Intelligent, N-Natural, E-Enthusiastic, E-Energetic, R-Resourcefull --- ENGINEER
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
