If recovery is aborted, any clients which did not complete the recovery process will be evicted by the MDS server. If I remember correctly, there is a limit on the amount of time that recovery will run. The time limit might get extended as more clients reconnect, but if there is no activity from the clients, the whole recovery process should timeout at some point. What does "lctl get_param mdt.*.recovery_status" show? Have any clients completed (or even started) recovery? I don't think the recovery timeout starts counting down until at least one client has reconnected. If there is something preventing the clients from contacting the MDS server, maybe the server is just sitting there indefinitely.
-- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu On May 20, 2014, at 10:13 PM, Javed Shaikh <[email protected]> wrote: > CentOS 6.4 / Lustre 2.4.2 (both client and servers) > > hi, > > it looks like MDTs are not recovering after more than 12hours of being in > that state. > there’s hardly any activity happening on the MDS. > > what would happen if the recovery is aborted through lctl? > > thanks, > javed > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
