Is adaptative timeout enable ?
( on MGS/MDS lctl get_param at_max )

Quentin Bouyer
System Engineer | Sgi France
+33 6 80 36 49 64
[email protected]<mailto:[email protected]>
[cid:[email protected]]

________________________________
From: [email protected] 
[mailto:[email protected]] On Behalf Of Ashok nulguda
Sent: vendredi 30 septembre 2011 06:39
To: [email protected]
Subject: [Lustre-discuss] help

Dear All,

I am having lustre error on my HPC as given below.Please any one can help me to 
resolve this problem.
Thanks in Advance.
Sep 30 08:40:23 service0 kernel: [343138.837222] Lustre: 
8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1 previous similar 
message
Sep 30 08:40:23 service0 kernel: [343138.837233] Lustre: 
lustre-OST0008-osc-ffff880b272cf800: Connection to service lustre-OST0008 via 
nid 10.148.0.106@o2ib was lost; in progress operations using this service will 
wait for recovery to complete.
Sep 30 08:40:24 service0 kernel: [343139.837260] Lustre: 
8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
x1380984193067288 sent from lustre-OST0006-osc-ffff880b272cf800 to NID 
10.148.0.106@o2ib 7s ago has timed out (7s prior to deadline).
Sep 30 08:40:24 service0 kernel: [343139.837263]   req@ffff880a5f800c00 
x1380984193067288/t0 o3->[email protected]@o2ib:6/4 lens 448/592 
e 0 to 1 dl 1317352224 ref 2 fl Rpc:/0/0 rc 0/0
Sep 30 08:40:24 service0 kernel: [343139.837269] Lustre: 
8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 38 previous similar 
messages
Sep 30 08:40:24 service0 kernel: [343140.129284] LustreError: 
9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: 
canceling anyway
Sep 30 08:40:24 service0 kernel: [343140.129290] LustreError: 
9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1 previous similar 
message
Sep 30 08:40:24 service0 kernel: [343140.129295] LustreError: 
9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11
Sep 30 08:40:24 service0 kernel: [343140.129299] LustreError: 
9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1 previous similar 
message
Sep 30 08:40:25 service0 kernel: [343140.837308] Lustre: 
8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
x1380984193067299 sent from lustre-OST0010-osc-ffff880b272cf800 to NID 
10.148.0.106@o2ib 7s ago has timed out (7s prior to deadline).
Sep 30 08:40:25 service0 kernel: [343140.837311]   req@ffff880a557c4400 
x1380984193067299/t0 o3->[email protected]@o2ib:6/4 lens 448/592 
e 0 to 1 dl 1317352225 ref 2 fl Rpc:/0/0 rc 0/0
Sep 30 08:40:25 service0 kernel: [343140.837316] Lustre: 
8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 4 previous similar 
messages
Sep 30 08:40:26 service0 kernel: [343141.245365] LustreError: 
30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: 
canceling anyway
Sep 30 08:40:26 service0 kernel: [343141.245371] LustreError: 
22729:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11
Sep 30 08:40:26 service0 kernel: [343141.245378] LustreError: 
30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1 previous similar 
message
Sep 30 08:40:33 service0 kernel: [343148.245683] Lustre: 
22725:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request 
x1380984193067302 sent from lustre-OST0004-osc-ffff880b272cf800 to NID 
10.148.0.106@o2ib 14s ago has timed out (14s prior to deadline).
Sep 30 08:40:33 service0 kernel: [343148.245686]   req@ffff8805c879e800 
x1380984193067302/t0 o103->[email protected]@o2ib:17/18 lens 
296/384 e 0 to 1 dl 1317352233 ref 1 fl Rpc:N/0/0 rc 0/0
Sep 30 08:40:33 service0 kernel: [343148.245692] Lustre: 
22725:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 2 previous similar 
messages
Sep 30 08:40:33 service0 kernel: [343148.245708] LustreError: 
22725:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: 
canceling anyway
Sep 30 08:40:33 service0 kernel: [343148.245714] LustreError: 
22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11
Sep 30 08:40:33 service0 kernel: [343148.245717] LustreError: 
22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1 previous similar 
message
Sep 30 08:40:36 service0 kernel: [343151.548005] LustreError: 11-0: an error 
occurred while communicating with 10.148.0.106@o2ib. The ost_connect operation 
failed with -16
Sep 30 08:40:36 service0 kernel: [343151.548008] LustreError: Skipped 1 
previous similar message
Sep 30 08:40:36 service0 kernel: [343151.548024] LustreError: 167-0: This 
client was evicted by lustre-OST000b; in progress operations using this service 
will fail.
Sep 30 08:40:36 service0 kernel: [343151.548250] LustreError: 
30452:0:(llite_mmap.c:210:ll_tree_unlock()) couldn't unlock -5
Sep 30 08:40:36 service0 kernel: [343151.550210] LustreError: 
8300:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID  
req@ffff88049528c400 x1380984193067406/t0 
o3->[email protected]@o2ib:6/4 lens 448/592 e 0 to 1 dl 0 ref 2 
fl Rpc:/0/0 rc 0/0
Sep 30 08:40:36 service0 kernel: [343151.594742] Lustre: 
lustre-OST0000-osc-ffff880b272cf800: Connection restored to service 
lustre-OST0000 using nid 10.148.0.106@o2ib.
Sep 30 08:40:36 service0 kernel: [343151.837203] Lustre: 
lustre-OST0006-osc-ffff880b272cf800: Connection restored to service 
lustre-OST0006 using nid 10.148.0.106@o2ib.
Sep 30 08:40:37 service0 kernel: [343152.842631] Lustre: 
lustre-OST0003-osc-ffff880b272cf800: Connection restored to service 
lustre-OST0003 using nid 10.148.0.106@o2ib.
Sep 30 08:40:37 service0 kernel: [343152.842636] Lustre: Skipped 3 previous 
similar messages


Thanks and Regards
Ashok

--
Ashok Nulguda
TATA ELXSI LTD
Mb : +91 9689945767
Email :[email protected]<mailto:[email protected]>

<<inline: image001.jpg>>

_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to