Dear All, I am having lustre error on my HPC as given below.Please any one can help me to resolve this problem. Thanks in Advance. Sep 30 08:40:23 service0 kernel: [343138.837222] Lustre: 8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 1 previous similar message Sep 30 08:40:23 service0 kernel: [343138.837233] Lustre: lustre-OST0008-osc-ffff880b272cf800: Connection to service lustre-OST0008 via nid 10.148.0.106@o2ib was lost; in progress operations using this service will wait for recovery to complete. Sep 30 08:40:24 service0 kernel: [343139.837260] Lustre: 8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1380984193067288 sent from lustre-OST0006-osc-ffff880b272cf800 to NID 10.148.0.106@o2ib 7s ago has timed out (7s prior to deadline). Sep 30 08:40:24 service0 kernel: [343139.837263] req@ffff880a5f800c00x1380984193067288/t0 o3->[email protected]@o2ib:6/4 lens 448/592 e 0 to 1 dl 1317352224 ref 2 fl Rpc:/0/0 rc 0/0 Sep 30 08:40:24 service0 kernel: [343139.837269] Lustre: 8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 38 previous similar messages Sep 30 08:40:24 service0 kernel: [343140.129284] LustreError: 9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway Sep 30 08:40:24 service0 kernel: [343140.129290] LustreError: 9983:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1 previous similar message Sep 30 08:40:24 service0 kernel: [343140.129295] LustreError: 9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11 Sep 30 08:40:24 service0 kernel: [343140.129299] LustreError: 9983:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1 previous similar message Sep 30 08:40:25 service0 kernel: [343140.837308] Lustre: 8300:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1380984193067299 sent from lustre-OST0010-osc-ffff880b272cf800 to NID 10.148.0.106@o2ib 7s ago has timed out (7s prior to deadline). Sep 30 08:40:25 service0 kernel: [343140.837311] req@ffff880a557c4400x1380984193067299/t0 o3->[email protected]@o2ib:6/4 lens 448/592 e 0 to 1 dl 1317352225 ref 2 fl Rpc:/0/0 rc 0/0 Sep 30 08:40:25 service0 kernel: [343140.837316] Lustre: 8300:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 4 previous similar messages Sep 30 08:40:26 service0 kernel: [343141.245365] LustreError: 30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway Sep 30 08:40:26 service0 kernel: [343141.245371] LustreError: 22729:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11 Sep 30 08:40:26 service0 kernel: [343141.245378] LustreError: 30978:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Skipped 1 previous similar message Sep 30 08:40:33 service0 kernel: [343148.245683] Lustre: 22725:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request x1380984193067302 sent from lustre-OST0004-osc-ffff880b272cf800 to NID 10.148.0.106@o2ib 14s ago has timed out (14s prior to deadline). Sep 30 08:40:33 service0 kernel: [343148.245686] req@ffff8805c879e800x1380984193067302/t0 o103->[email protected]@o2ib:17/18 lens 296/384 e 0 to 1 dl 1317352233 ref 1 fl Rpc:N/0/0 rc 0/0 Sep 30 08:40:33 service0 kernel: [343148.245692] Lustre: 22725:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 2 previous similar messages Sep 30 08:40:33 service0 kernel: [343148.245708] LustreError: 22725:0:(ldlm_request.c:1025:ldlm_cli_cancel_req()) Got rc -11 from cancel RPC: canceling anyway Sep 30 08:40:33 service0 kernel: [343148.245714] LustreError: 22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) ldlm_cli_cancel_list: -11 Sep 30 08:40:33 service0 kernel: [343148.245717] LustreError: 22725:0:(ldlm_request.c:1587:ldlm_cli_cancel_list()) Skipped 1 previous similar message Sep 30 08:40:36 service0 kernel: [343151.548005] LustreError: 11-0: an error occurred while communicating with 10.148.0.106@o2ib. The ost_connect operation failed with -16 Sep 30 08:40:36 service0 kernel: [343151.548008] LustreError: Skipped 1 previous similar message Sep 30 08:40:36 service0 kernel: [343151.548024] LustreError: 167-0: This client was evicted by lustre-OST000b; in progress operations using this service will fail. Sep 30 08:40:36 service0 kernel: [343151.548250] LustreError: 30452:0:(llite_mmap.c:210:ll_tree_unlock()) couldn't unlock -5 Sep 30 08:40:36 service0 kernel: [343151.550210] LustreError: 8300:0:(client.c:858:ptlrpc_import_delay_req()) @@@ IMP_INVALID req@ffff88049528c400 x1380984193067406/t0 o3->[email protected]@o2ib:6/4 lens 448/592 e 0 to 1 dl 0 ref 2 fl Rpc:/0/0 rc 0/0 Sep 30 08:40:36 service0 kernel: [343151.594742] Lustre: lustre-OST0000-osc-ffff880b272cf800: Connection restored to service lustre-OST0000 using nid 10.148.0.106@o2ib. Sep 30 08:40:36 service0 kernel: [343151.837203] Lustre: lustre-OST0006-osc-ffff880b272cf800: Connection restored to service lustre-OST0006 using nid 10.148.0.106@o2ib. Sep 30 08:40:37 service0 kernel: [343152.842631] Lustre: lustre-OST0003-osc-ffff880b272cf800: Connection restored to service lustre-OST0003 using nid 10.148.0.106@o2ib. Sep 30 08:40:37 service0 kernel: [343152.842636] Lustre: Skipped 3 previous similar messages
Thanks and Regards Ashok -- *Ashok Nulguda * *TATA ELXSI LTD* *Mb : +91 9689945767 * *Email :[email protected] <[email protected]>*
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
