Hi! While debugging connection problems on a Lustre client running 1.6.5.1 on RHEL5, I discovered early replies in the client debug output. Adaptive timeouts are disabled on the client, and our server infrastructure in running stock 1.6.4.3 (RHEL5), still. As far as I understood, early replies should only occur post 1.6.5 when adaptive timeouts are active. Did I get this wrong, and early replies are prefectly valid even in our setup? Or is the 1.6.5.1 client misinterpreting message headers?
I'm attaching a debug trace for a single XID on the client and the MDS/MGS. Thanks, Daniel.
==========> client ([EMAIL PROTECTED], Lustre 1.6.5.1) <========== [EMAIL PROTECTED] # lctl debug_file /tmp/lustre-debug.client.log | grep 266023 00000100:00100000:2:1217244344.471204:0:9991:0:(client.c:1784:ptlrpc_queue_wait()) Sending RPC pname:cluuid:pid:xid:nid:opc ll_cfg_requeue:38cc9155-5e64-1d01-bbf2-8b621120e1b0:9991:x266023:[EMAIL PROTECTED]:101 00000100:00000200:2:1217244344.471250:0:9991:0:(niobuf.c:540:ptl_send_rpc()) Setup reply buffer: 368 bytes, xid 266023, portal 25 00000100:00000040:2:1217244344.471269:0:9991:0:(niobuf.c:561:ptl_send_rpc()) @@@ send flg=0 [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0 00000100:00000200:2:1217244344.471299:0:9991:0:(niobuf.c:70:ptl_send_buf()) Sending 232 bytes to portal 26, xid 266023, offset 0 00000100:00000200:2:1217244344.471379:0:9991:0:(client.c:1871:ptlrpc_queue_wait()) @@@ -- sleeping for 100000 ticks [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0 00000100:00000200:2:1217244344.471399:0:9991:0:(client.c:771:ptlrpc_check_reply()) @@@ rc = 0 for [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0 00000100:00000200:2:1217244344.471416:0:9991:0:(client.c:771:ptlrpc_check_reply()) @@@ rc = 0 for [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0 00000100:00000200:1:1217244344.471440:0:3049:0:(events.c:55:request_out_callback()) @@@ type 4, status 0 [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0 00000100:00000040:1:1217244344.471458:0:3049:0:(client.c:1526:__ptlrpc_req_finished()) @@@ refcount now 1 [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 2 fl Rpc:/0/0 rc 0/0 00000100:00000200:1:1217244344.471565:0:3049:0:(events.c:84:reply_in_callback()) @@@ type 1, status 0 [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 1 fl Rpc:/0/0 rc 0/0 00000100:00001000:1:1217244344.471578:0:3049:0:(events.c:112:reply_in_callback()) @@@ Early reply received: mlen=240 offset=0 replen=368 replied=0 unlinked=0 [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 0 to 100 dl 1217244444 ref 1 fl Rpc:/0/0 rc 0/0 00000100:00000200:2:1217244344.471652:0:9991:0:(client.c:771:ptlrpc_check_reply()) @@@ rc = 0 for [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:/0/0 rc 0/0 00000100:00000200:2:1217244344.471668:0:9991:0:(client.c:771:ptlrpc_check_reply()) @@@ rc = 0 for [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:/0/0 rc 0/0 00000100:00000100:2:1217244444.471354:0:9991:0:(client.c:1198:ptlrpc_expire_one_request()) @@@ timeout (sent at 1217244344, 100s ago) [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:/0/0 rc 0/0 00000100:02000400:2:1217244444.471376:0:9991:0:(client.c:1206:ptlrpc_expire_one_request()) Request x266023 sent from [EMAIL PROTECTED] to NID [EMAIL PROTECTED] 100s ago has timed out (limit 100s). 00000100:00000200:2:1217244444.471845:0:9991:0:(events.c:84:reply_in_callback()) @@@ type 5, status 0 [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:X/0/0 rc 0/0 00000100:00100000:2:1217244444.471859:0:9991:0:(events.c:102:reply_in_callback()) @@@ unlink [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:X/0/0 rc 0/0 00000100:00100000:2:1217244444.471970:0:9991:0:(client.c:2091:ptlrpc_abort_inflight()) @@@ inflight [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:X/0/0 rc 0/0 00000100:00000200:2:1217244444.472014:0:9991:0:(client.c:771:ptlrpc_check_reply()) @@@ rc = 1 for [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:EX/0/0 rc -4/0 00000100:00100000:2:1217244444.472029:0:9991:0:(client.c:1891:ptlrpc_queue_wait()) Completed RPC pname:cluuid:pid:xid:nid:opc ll_cfg_requeue:38cc9155-5e64-1d01-bbf2-8b621120e1b0:9991:x266023:[EMAIL PROTECTED]:101 00000100:00100000:2:1217244444.472038:0:9991:0:(client.c:1905:ptlrpc_queue_wait()) @@@ err rc=0 status=-4 [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Rpc:EX/0/0 rc -4/0 00000100:00000040:2:1217244444.472229:0:9991:0:(client.c:1526:__ptlrpc_req_finished()) @@@ refcount now 0 [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]@o2ib_0:26/25 lens 232/368 e 1 to 100 dl 1217244444 ref 1 fl Interpret:EX/0/0 rc -4/0 [EMAIL PROTECTED] # ps xa | grep -E '3049|9991' | grep -v grep 3049 ? S 0:00 [kiblnd_sd_01] 9991 ? S 0:00 [ll_cfg_requeue] ==========> MDS ([EMAIL PROTECTED], Lustre 1.6.4.3) <========== [EMAIL PROTECTED] # lctl debug_file /tmp/lustre-debug.mds.log | grep 266023 20000000:01000000:2:1217244344.469271:0:5378:0:(mgs_handler.c:549:mgs_handle()) @@@ enqueue [EMAIL PROTECTED] x266023/t0 o101->[EMAIL PROTECTED]:-1 lens 232/0 ref 0 fl Interpret:/0/0 rc 0/0 [EMAIL PROTECTED] # ps xa | grep 5378 | grep -v grep 5378 ? S 2:23 [ll_mgs_01]
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
