Good afternoon, I've come across a rather vexing problem within one of my lustre file systems. A directory whose contents can't be viewed, but into which writes can take place. Attempting to ls into that directory hangs, but lctl getstripe still works.
After attempting to look in the directory the node displays the following, even after the ls is cancelled. [4498716.485619] Lustre: 18859:0:(client.c:2116:ptlrpc_expire_one_request()) @@@ Request sent has timed out for slow reply: [sent 1580497199/real 1580497199] req@ffff9129f7c78c00 x1652557100931648/t0(0) o101->[email protected]@o2ib:12/10 lens 696/33584 e 24 to 1 dl 1580497800 ref 1 fl Rpc:X/2/ffffffff rc -11/-1 [4498716.485642] Lustre: lustre19-MDT0000-mdc-ffff91091289f000: Connection to lustre19-MDT0000 (at 172.17.0.36@o2ib) was lost; in progress operations using this service will wait for recovery to complete [4498716.486114] Lustre: lustre19-MDT0000-mdc-ffff91091289f000: Connection restored to 172.17.0.36@o2ib (at 172.17.0.36@o2ib) Since the issue started more files have been written into the directory, but none of them can be read. Further, since the issue began the metadata server has been generating lustre-logs a few times a day. I'm running luster 2.12.1 with zfs on the metadata system (and the osts) on CentOS 7.6 w/r, Kurt J. Strosahl System Administrator: Lustre, HPC Scientific Computing Group, Thomas Jefferson National Accelerator Facility
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
