On Thu, 10 Jan 2008, Niklas Edmundsson wrote:
> However, we had this weird incident where an active client (it was
> copying 4GB files and running ls at the time) got evicted by the MDS
> and all OST's. After a while logs indicate that it did recover the
> connection to the MDT, but not the OST's ...
>
> lctl ping between the client and mds/oss's still worked.
>
> umount -f, which also hung.
After I rebooted the client it recovered. However, I got this
rather suspicious log message on the MDS (similar on the OSS's):
Jan 10 14:11:43 Lustre: 27208:0:(router.c:167:lnet_notify()) Ignoring
prediction from [EMAIL PROTECTED] of [EMAIL PROTECTED] down 4829687047 seconds
in the future
> So, what should happen in situations like this? What can I do to debug
> it? Has anyone else seen this?
>
> 130.239.78.238 is the client, 130.239.78.233 is the MGS/MDT.
>
> Logs from the client:
> ----------------------------8<------------------------
> Jan 10 12:40:38 LustreError: 11-0: an error occurred while communicating with
> [EMAIL PROTECTED] The ldlm_enqueue operation failed with -107
> Jan 10 12:40:38 LustreError: Skipped 1 previous similar message
> Jan 10 12:40:38 Lustre: hpfs-MDT0000-mdc-ffff8100016d2c00: Connection to
> service hpfs-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
> operations using this service will wait for recovery to complete.
> Jan 10 12:40:38 LustreError: 167-0: This client was evicted by hpfs-MDT0000;
> in progress operations using this service will fail.
> Jan 10 12:40:38 LustreError: 7975:0:(mdc_locks.c:424:mdc_finish_enqueue())
> ldlm_cli_enqueue: -5
> Jan 10 12:40:38 LustreError: 7975:0:(client.c:519:ptlrpc_import_delay_req())
> @@@ IMP_INVALID [EMAIL PROTECTED] x649493/t0 o101->[EMAIL PROTECTED]@tcp:12
> lens 432/912 ref 1 fl Rpc:/0/0 rc 0/0
> Jan 10 12:40:38 LustreError: 7975:0:(mdc_locks.c:424:mdc_finish_enqueue())
> ldlm_cli_enqueue: -108
> Jan 10 12:41:40 LustreError: 7979:0:(client.c:519:ptlrpc_import_delay_req())
> @@@ IMP_INVALID [EMAIL PROTECTED] x649521/t0 o101->[EMAIL PROTECTED]@tcp:12
> lens 512/912 ref 1 fl Rpc:P/0/0 rc 0/0
> Jan 10 12:41:40 LustreError: 7979:0:(mdc_locks.c:424:mdc_finish_enqueue())
> ldlm_cli_enqueue: -108
> Jan 10 12:42:18 LustreError: 7976:0:(client.c:519:ptlrpc_import_delay_req())
> @@@ IMP_INVALID [EMAIL PROTECTED] x649532/t0 o35->[EMAIL PROTECTED]@tcp:12
> lens 296/784 ref 1 fl Rpc:/0/0 rc 0/0
> Jan 10 12:42:18 LustreError: 7976:0:(client.c:519:ptlrpc_import_delay_req())
> Skipped 1 previous similar message
> Jan 10 12:42:18 LustreError: 7976:0:(file.c:98:ll_close_inode_openhandle())
> inode 8421378 mdc close failed: rc = -108
> Jan 10 12:42:18 LustreError: 7976:0:(file.c:98:ll_close_inode_openhandle())
> inode 8421383 mdc close failed: rc = -108
> Jan 10 12:42:18 Lustre: hpfs-MDT0000-mdc-ffff8100016d2c00: Connection
> restored to service hpfs-MDT0000 using nid [EMAIL PROTECTED]
> Jan 10 12:42:48 LustreError: 11-0: an error occurred while communicating with
> [EMAIL PROTECTED] The ost_setattr operation failed with -107
> Jan 10 12:42:48 Lustre: hpfs-OST0007-osc-ffff8100016d2c00: Connection to
> service hpfs-OST0007 via nid [EMAIL PROTECTED] was lost; in progress
> operations using this service will wait for recovery to complete.
> Jan 10 12:42:48 LustreError: 167-0: This client was evicted by hpfs-OST0007;
> in progress operations using this service will fail.
> Jan 10 12:42:48 LustreError: 7999:0:(llite_lib.c:1480:ll_setattr_raw())
> obd_setattr_async fails: rc=-107
> Jan 10 12:42:48 LustreError: 7999:0:(llite_lib.c:1480:ll_setattr_raw())
> obd_setattr_async fails: rc=-5
> Jan 10 12:42:49 LustreError: 8001:0:(llite_lib.c:1480:ll_setattr_raw())
> obd_setattr_async fails: rc=-5
> Jan 10 12:42:50 LustreError: 11-0: an error occurred while communicating with
> [EMAIL PROTECTED] The ost_setattr operation failed with -107
> Jan 10 12:42:50 Lustre: hpfs-OST0000-osc-ffff8100016d2c00: Connection to
> service hpfs-OST0000 via nid [EMAIL PROTECTED] was lost; in progress
> operations using this service will wait for recovery to complete.
> Jan 10 12:42:50 LustreError: 167-0: This client was evicted by hpfs-OST0000;
> in progress operations using this service will fail.
> Jan 10 12:42:50 LustreError: 8002:0:(llite_lib.c:1480:ll_setattr_raw())
> obd_setattr_async fails: rc=-107
> Jan 10 12:42:50 LustreError: 8002:0:(llite_lib.c:1480:ll_setattr_raw())
> Skipped 1 previous similar message
> Jan 10 12:42:52 LustreError: 11-0: an error occurred while communicating with
> [EMAIL PROTECTED] The ost_setattr operation failed with -107
> Jan 10 12:42:52 Lustre: hpfs-OST0002-osc-ffff8100016d2c00: Connection to
> service hpfs-OST0002 via nid [EMAIL PROTECTED] was lost; in progress
> operations using this service will wait for recovery to complete.
> Jan 10 12:42:52 LustreError: 167-0: This client was evicted by hpfs-OST0002;
> in progress operations using this service will fail.
> Jan 10 12:50:41 LustreError: 11-0: an error occurred while communicating with
> [EMAIL PROTECTED] The ldlm_enqueue operation failed with -107
> Jan 10 12:50:41 Lustre: hpfs-MDT0000-mdc-ffff8100016d2c00: Connection to
> service hpfs-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress
> operations using this service will wait for recovery to complete.
> Jan 10 12:50:41 LustreError: 167-0: This client was evicted by hpfs-MDT0000;
> in progress operations using this service will fail.
> Jan 10 12:50:41 LustreError: 8017:0:(mdc_locks.c:424:mdc_finish_enqueue())
> ldlm_cli_enqueue: -5
> Jan 10 12:50:41 LustreError: 8017:0:(mdc_locks.c:424:mdc_finish_enqueue())
> Skipped 1 previous similar message
> Jan 10 12:50:41 LustreError: 8017:0:(file.c:2513:ll_inode_revalidate_fini())
> failure -5 inode 8421377
> Jan 10 12:50:41 LustreError: 8016:0:(file.c:2513:ll_inode_revalidate_fini())
> failure -5 inode 8421377
> Jan 10 12:50:56 LustreError: 8019:0:(client.c:519:ptlrpc_import_delay_req())
> @@@ IMP_INVALID [EMAIL PROTECTED] x649712/t0 o101->[EMAIL PROTECTED]@tcp:12
> lens 512/912 ref 1 fl Rpc:P/0/0 rc 0/0
> Jan 10 12:50:56 LustreError: 8019:0:(client.c:519:ptlrpc_import_delay_req())
> Skipped 1 previous similar message
> Jan 10 12:50:57 LustreError: 8019:0:(mdc_locks.c:424:mdc_finish_enqueue())
> ldlm_cli_enqueue: -108
> Jan 10 12:50:57 LustreError: 8019:0:(mdc_locks.c:424:mdc_finish_enqueue())
> Skipped 1 previous similar message
> Jan 10 12:50:59 LustreError: 8022:0:(client.c:519:ptlrpc_import_delay_req())
> @@@ IMP_INVALID [EMAIL PROTECTED] x649715/t0 o101->[EMAIL PROTECTED]@tcp:12
> lens 512/912 ref 1 fl Rpc:P/0/0 rc 0/0
> Jan 10 12:50:59 LustreError: 8022:0:(client.c:519:ptlrpc_import_delay_req())
> Skipped 2 previous similar messages
> Jan 10 12:52:21 Lustre: hpfs-MDT0000-mdc-ffff8100016d2c00: Connection
> restored to service hpfs-MDT0000 using nid [EMAIL PROTECTED]
> Jan 10 12:57:46 Lustre: setting import hpfs-MDT0000_UUID INACTIVE by
> administrator request
> Jan 10 12:59:26 Lustre: setting import hpfs-OST0000_UUID INACTIVE by
> administrator request
> ----------------------------8<------------------------
>
> Logs from the MGS/MDT:
> ----------------------------8<------------------------
> Jan 10 12:20:31 Lustre: MGS: haven't heard from client
> 01a8bcfc-fd98-90a9-6aeb-7c331a658b2e (at [EMAIL PROTECTED]) in 233 seconds. I
> think it's dead, and I am evicting it.
> Jan 10 12:21:11 Lustre: hpfs-MDT0000: haven't heard from client
> c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 271 seconds. I
> think it's dead, and I am evicting it.
> Jan 10 12:40:38 LustreError: 27332:0:(handler.c:1502:mds_handle()) operation
> 101 on unconnected MDS from [EMAIL PROTECTED]
> Jan 10 12:40:38 LustreError:
> 27332:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-107)
> [EMAIL PROTECTED] x649491/t0 o101-><?>@<?>:-1 lens 512/0 ref 0 fl
> Interpret:/0/0 rc -107/0
> Jan 10 12:47:26 Lustre: hpfs-MDT0000: haven't heard from client
> c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 274 seconds. I
> think it's dead, and I am evicting it.
> Jan 10 12:50:41 LustreError: 27322:0:(handler.c:1502:mds_handle()) operation
> 101 on unconnected MDS from [EMAIL PROTECTED]
> Jan 10 12:50:41 LustreError:
> 27322:0:(ldlm_lib.c:1442:target_send_reply_msg()) @@@ processing error (-107)
> [EMAIL PROTECTED] x649703/t0 o101-><?>@<?>:-1 lens 432/0 ref 0 fl
> Interpret:/0/0 rc -107/0
> Jan 10 12:57:26 Lustre: hpfs-MDT0000: haven't heard from client
> c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 251 seconds. I
> think it's dead, and I am evicting it.
> ----------------------------8<------------------------
>
> Logs from one of the OSS's (serves two OST's). Logs similar on all
> OSS's:
> ----------------------------8<------------------------
> Jan 10 12:20:46 Lustre: hpfs-OST0002: haven't heard from client
> c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 246 seconds. I
> think it's dead, and I am evicting it.
> Jan 10 12:20:56 Lustre: hpfs-OST0003: haven't heard from client
> c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 256 seconds. I
> think it's dead, and I am evicting it.
> Jan 10 12:42:52 LustreError: 6665:0:(ldlm_lib.c:1442:target_send_reply_msg())
> @@@ processing error (-107) [EMAIL PROTECTED] x649565/t0 o2-><?>@<?>:-1 lens
> 336/0 ref 0 fl Interpret:/0/0 rc -107/0
> Jan 10 12:47:01 Lustre: hpfs-OST0002: haven't heard from client
> c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 249 seconds. I
> think it's dead, and I am evicting it.
> ----------------------------8<------------------------
>
>
> /Nikke
>
/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED]
---------------------------------------------------------------------------
Jimmy Hoffa is buried here ------> X
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss