Hi all,
(Hopefully) simple two questions this time around. This is for 2.14.0, and my
cluster is setup with no failovers for MDTs or OSTs. OBD timeouts have not
been altered from the defaults.
Question 1:
I read on the Lustre Wiki that the appropriate ordering to umount the various
components of a Lustre filesystem is:
1. Clients
2. MDT(s)
3. OSTs
4. MGS
However, if I do it this way, the OST mounts always hang for 04:25 seconds
before umounting. Dmesg reports:
[88944.272233] Lustre: 30178:0:(client.c:2282:ptlrpc_expire_one_request()) @@@
Request sent has timed out for slow reply: [sent 1645111309/real 1645111309]
req@00000000cc9c1aeb x1724931853622016/t0(0)
o39->[email protected]@tcp:12/10 lens 224/224 e 0 to 1 dl
1645111574 ref 2 fl Rpc:XNQr/0/ffffffff rc 0/-1 job:''
[88944.275884] Lustre: Failing over lustrefs-OST0000
[88944.429622] Lustre: server umount lustrefs-OST0000 complete
For reference, if I reverse OSTs and MDT (do the MDT second), then all of the
OST umounts are fast, but the MDT takes a whopping 8 minutes and 50 seconds to
umount.
Why is the canonical shutdown ordering delaying so long (and so specifically)
for me?
Question 2:
In all cases (OSTs or MDTs) of umount, whether they are fast or not, I see
messages like the following in dmesg:
[88944.275884] Lustre: Failing over lustrefs-OST0000
or
[78406.007678] Lustre: Failing over lustrefs-MDT0000
There is no failover configured in my setup. The MGS is up the entire time in
all cases. What is lustre doing here? How do I explicitly disable this
failover attempt, since it seems to be at best misleading and at worst directly
related to the lengthy delays? FWIW, I have tried umount with '-f' to cause
the MDT to go into failout rather than failover to no avail.
Thanks for any help folks can offer on this in advance,
ellis
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org