Good afternoon,

   I've come across a rather vexing problem within one of my lustre file 
systems.  A directory whose contents can't be viewed, but into which writes can 
take place.  Attempting to ls into that directory hangs, but lctl getstripe 
still works.

After attempting to look in the directory the node displays the following, even 
after the ls is cancelled.
[4498716.485619] Lustre: 18859:0:(client.c:2116:ptlrpc_expire_one_request()) 
@@@ Request sent has timed out for slow reply: [sent 1580497199/real 
1580497199]  req@ffff9129f7c78c00 x1652557100931648/t0(0) 
o101->[email protected]@o2ib:12/10 lens 
696/33584 e 24 to 1 dl 1580497800 ref 1 fl Rpc:X/2/ffffffff rc -11/-1
[4498716.485642] Lustre: lustre19-MDT0000-mdc-ffff91091289f000: Connection to 
lustre19-MDT0000 (at 172.17.0.36@o2ib) was lost; in progress operations using 
this service will wait for recovery to complete
[4498716.486114] Lustre: lustre19-MDT0000-mdc-ffff91091289f000: Connection 
restored to 172.17.0.36@o2ib (at 172.17.0.36@o2ib)

Since the issue started more files have been written into the directory, but 
none of them can be read.

Further, since the issue began the metadata server has been generating 
lustre-logs a few times a day.

I'm running luster 2.12.1 with zfs on the metadata system (and the osts) on 
CentOS 7.6

w/r,

Kurt J. Strosahl
System Administrator: Lustre, HPC
Scientific Computing Group, Thomas Jefferson National Accelerator Facility
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to