Hi!
We've started to poke and prod at Lustre 1.6.4.1, and it seems to
mostly work (we haven't had it OOPS on us yet like the earlier
1.6-versions did).
However, we had this weird incident where an active client (it was
copying 4GB files and running ls at the time) got evicted by the MDS
and all OST's. After a while logs indicate that it did recover the
connection to the MDT, but not the OST's ...
lctl ping between the client and mds/oss's still worked.
After a while I got tired of waiting for it to recover and did
umount -f, which also hung.
So, what should happen in situations like this? What can I do to debug
it? Has anyone else seen this?
130.239.78.238 is the client, 130.239.78.233 is the MGS/MDT.
Logs from the client:
----------------------------8<------------------------
Jan 10 12:40:38 LustreError: 11-0: an error occurred while communicating with
[EMAIL PROTECTED] The ldlm_enqueue operation failed with -107
Jan 10 12:40:38 LustreError: Skipped 1 previous similar message
Jan 10 12:40:38 Lustre: hpfs-MDT0000-mdc-ffff8100016d2c00: Connection to
service hpfs-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations
using this service will wait for recovery to complete.
Jan 10 12:40:38 LustreError: 167-0: This client was evicted by hpfs-MDT0000; in
progress operations using this service will fail.
Jan 10 12:40:38 LustreError: 7975:0:(mdc_locks.c:424:mdc_finish_enqueue())
ldlm_cli_enqueue: -5
Jan 10 12:40:38 LustreError: 7975:0:(client.c:519:ptlrpc_import_delay_req())
@@@ IMP_INVALID [EMAIL PROTECTED] x649493/t0 o101->[EMAIL PROTECTED]@tcp:12
lens 432/912 ref 1 fl Rpc:/0/0 rc 0/0
Jan 10 12:40:38 LustreError: 7975:0:(mdc_locks.c:424:mdc_finish_enqueue())
ldlm_cli_enqueue: -108
Jan 10 12:41:40 LustreError: 7979:0:(client.c:519:ptlrpc_import_delay_req())
@@@ IMP_INVALID [EMAIL PROTECTED] x649521/t0 o101->[EMAIL PROTECTED]@tcp:12
lens 512/912 ref 1 fl Rpc:P/0/0 rc 0/0
Jan 10 12:41:40 LustreError: 7979:0:(mdc_locks.c:424:mdc_finish_enqueue())
ldlm_cli_enqueue: -108
Jan 10 12:42:18 LustreError: 7976:0:(client.c:519:ptlrpc_import_delay_req())
@@@ IMP_INVALID [EMAIL PROTECTED] x649532/t0 o35->[EMAIL PROTECTED]@tcp:12
lens 296/784 ref 1 fl Rpc:/0/0 rc 0/0
Jan 10 12:42:18 LustreError: 7976:0:(client.c:519:ptlrpc_import_delay_req())
Skipped 1 previous similar message
Jan 10 12:42:18 LustreError: 7976:0:(file.c:98:ll_close_inode_openhandle())
inode 8421378 mdc close failed: rc = -108
Jan 10 12:42:18 LustreError: 7976:0:(file.c:98:ll_close_inode_openhandle())
inode 8421383 mdc close failed: rc = -108
Jan 10 12:42:18 Lustre: hpfs-MDT0000-mdc-ffff8100016d2c00: Connection restored
to service hpfs-MDT0000 using nid [EMAIL PROTECTED]
Jan 10 12:42:48 LustreError: 11-0: an error occurred while communicating with
[EMAIL PROTECTED] The ost_setattr operation failed with -107
Jan 10 12:42:48 Lustre: hpfs-OST0007-osc-ffff8100016d2c00: Connection to
service hpfs-OST0007 via nid [EMAIL PROTECTED] was lost; in progress operations
using this service will wait for recovery to complete.
Jan 10 12:42:48 LustreError: 167-0: This client was evicted by hpfs-OST0007; in
progress operations using this service will fail.
Jan 10 12:42:48 LustreError: 7999:0:(llite_lib.c:1480:ll_setattr_raw())
obd_setattr_async fails: rc=-107
Jan 10 12:42:48 LustreError: 7999:0:(llite_lib.c:1480:ll_setattr_raw())
obd_setattr_async fails: rc=-5
Jan 10 12:42:49 LustreError: 8001:0:(llite_lib.c:1480:ll_setattr_raw())
obd_setattr_async fails: rc=-5
Jan 10 12:42:50 LustreError: 11-0: an error occurred while communicating with
[EMAIL PROTECTED] The ost_setattr operation failed with -107
Jan 10 12:42:50 Lustre: hpfs-OST0000-osc-ffff8100016d2c00: Connection to
service hpfs-OST0000 via nid [EMAIL PROTECTED] was lost; in progress operations
using this service will wait for recovery to complete.
Jan 10 12:42:50 LustreError: 167-0: This client was evicted by hpfs-OST0000; in
progress operations using this service will fail.
Jan 10 12:42:50 LustreError: 8002:0:(llite_lib.c:1480:ll_setattr_raw())
obd_setattr_async fails: rc=-107
Jan 10 12:42:50 LustreError: 8002:0:(llite_lib.c:1480:ll_setattr_raw()) Skipped
1 previous similar message
Jan 10 12:42:52 LustreError: 11-0: an error occurred while communicating with
[EMAIL PROTECTED] The ost_setattr operation failed with -107
Jan 10 12:42:52 Lustre: hpfs-OST0002-osc-ffff8100016d2c00: Connection to
service hpfs-OST0002 via nid [EMAIL PROTECTED] was lost; in progress operations
using this service will wait for recovery to complete.
Jan 10 12:42:52 LustreError: 167-0: This client was evicted by hpfs-OST0002; in
progress operations using this service will fail.
Jan 10 12:50:41 LustreError: 11-0: an error occurred while communicating with
[EMAIL PROTECTED] The ldlm_enqueue operation failed with -107
Jan 10 12:50:41 Lustre: hpfs-MDT0000-mdc-ffff8100016d2c00: Connection to
service hpfs-MDT0000 via nid [EMAIL PROTECTED] was lost; in progress operations
using this service will wait for recovery to complete.
Jan 10 12:50:41 LustreError: 167-0: This client was evicted by hpfs-MDT0000; in
progress operations using this service will fail.
Jan 10 12:50:41 LustreError: 8017:0:(mdc_locks.c:424:mdc_finish_enqueue())
ldlm_cli_enqueue: -5
Jan 10 12:50:41 LustreError: 8017:0:(mdc_locks.c:424:mdc_finish_enqueue())
Skipped 1 previous similar message
Jan 10 12:50:41 LustreError: 8017:0:(file.c:2513:ll_inode_revalidate_fini())
failure -5 inode 8421377
Jan 10 12:50:41 LustreError: 8016:0:(file.c:2513:ll_inode_revalidate_fini())
failure -5 inode 8421377
Jan 10 12:50:56 LustreError: 8019:0:(client.c:519:ptlrpc_import_delay_req())
@@@ IMP_INVALID [EMAIL PROTECTED] x649712/t0 o101->[EMAIL PROTECTED]@tcp:12
lens 512/912 ref 1 fl Rpc:P/0/0 rc 0/0
Jan 10 12:50:56 LustreError: 8019:0:(client.c:519:ptlrpc_import_delay_req())
Skipped 1 previous similar message
Jan 10 12:50:57 LustreError: 8019:0:(mdc_locks.c:424:mdc_finish_enqueue())
ldlm_cli_enqueue: -108
Jan 10 12:50:57 LustreError: 8019:0:(mdc_locks.c:424:mdc_finish_enqueue())
Skipped 1 previous similar message
Jan 10 12:50:59 LustreError: 8022:0:(client.c:519:ptlrpc_import_delay_req())
@@@ IMP_INVALID [EMAIL PROTECTED] x649715/t0 o101->[EMAIL PROTECTED]@tcp:12
lens 512/912 ref 1 fl Rpc:P/0/0 rc 0/0
Jan 10 12:50:59 LustreError: 8022:0:(client.c:519:ptlrpc_import_delay_req())
Skipped 2 previous similar messages
Jan 10 12:52:21 Lustre: hpfs-MDT0000-mdc-ffff8100016d2c00: Connection restored
to service hpfs-MDT0000 using nid [EMAIL PROTECTED]
Jan 10 12:57:46 Lustre: setting import hpfs-MDT0000_UUID INACTIVE by
administrator request
Jan 10 12:59:26 Lustre: setting import hpfs-OST0000_UUID INACTIVE by
administrator request
----------------------------8<------------------------
Logs from the MGS/MDT:
----------------------------8<------------------------
Jan 10 12:20:31 Lustre: MGS: haven't heard from client
01a8bcfc-fd98-90a9-6aeb-7c331a658b2e (at [EMAIL PROTECTED]) in 233 seconds. I
think it's dead, and I am evicting it.
Jan 10 12:21:11 Lustre: hpfs-MDT0000: haven't heard from client
c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 271 seconds. I
think it's dead, and I am evicting it.
Jan 10 12:40:38 LustreError: 27332:0:(handler.c:1502:mds_handle()) operation
101 on unconnected MDS from [EMAIL PROTECTED]
Jan 10 12:40:38 LustreError: 27332:0:(ldlm_lib.c:1442:target_send_reply_msg())
@@@ processing error (-107) [EMAIL PROTECTED] x649491/t0 o101-><?>@<?>:-1 lens
512/0 ref 0 fl Interpret:/0/0 rc -107/0
Jan 10 12:47:26 Lustre: hpfs-MDT0000: haven't heard from client
c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 274 seconds. I
think it's dead, and I am evicting it.
Jan 10 12:50:41 LustreError: 27322:0:(handler.c:1502:mds_handle()) operation
101 on unconnected MDS from [EMAIL PROTECTED]
Jan 10 12:50:41 LustreError: 27322:0:(ldlm_lib.c:1442:target_send_reply_msg())
@@@ processing error (-107) [EMAIL PROTECTED] x649703/t0 o101-><?>@<?>:-1 lens
432/0 ref 0 fl Interpret:/0/0 rc -107/0
Jan 10 12:57:26 Lustre: hpfs-MDT0000: haven't heard from client
c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 251 seconds. I
think it's dead, and I am evicting it.
----------------------------8<------------------------
Logs from one of the OSS's (serves two OST's). Logs similar on all
OSS's:
----------------------------8<------------------------
Jan 10 12:20:46 Lustre: hpfs-OST0002: haven't heard from client
c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 246 seconds. I
think it's dead, and I am evicting it.
Jan 10 12:20:56 Lustre: hpfs-OST0003: haven't heard from client
c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 256 seconds. I
think it's dead, and I am evicting it.
Jan 10 12:42:52 LustreError: 6665:0:(ldlm_lib.c:1442:target_send_reply_msg())
@@@ processing error (-107) [EMAIL PROTECTED] x649565/t0 o2-><?>@<?>:-1 lens
336/0 ref 0 fl Interpret:/0/0 rc -107/0
Jan 10 12:47:01 Lustre: hpfs-OST0002: haven't heard from client
c542d305-5995-f79d-1c8d-c9578393358a (at [EMAIL PROTECTED]) in 249 seconds. I
think it's dead, and I am evicting it.
----------------------------8<------------------------
/Nikke
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Niklas Edmundsson, Admin @ {acc,hpc2n}.umu.se | [EMAIL PROTECTED]
---------------------------------------------------------------------------
The magic of Windows - turn a 486-50 into a 4MHz XT...
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
_______________________________________________
Lustre-discuss mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-discuss