If recovery is aborted, any clients which did not complete the recovery process 
will be evicted by the MDS server.  If I remember correctly, there is a limit 
on the amount of time that recovery will run.  The time limit might get 
extended as more clients reconnect, but if there is no activity from the 
clients, the whole recovery process should timeout at some point. What does 
"lctl get_param mdt.*.recovery_status" show?  Have any clients completed (or 
even started) recovery?  I don't think the recovery timeout starts counting 
down until at least one client has reconnected.  If there is something 
preventing the clients from contacting the MDS server, maybe the server is just 
sitting there indefinitely.

-- 
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu


On May 20, 2014, at 10:13 PM, Javed Shaikh <[email protected]>
 wrote:

> CentOS 6.4 / Lustre 2.4.2 (both client and servers)
>  
> hi,
>  
> it looks like MDTs are not recovering after more than 12hours of being in 
> that state.
> there’s hardly any activity happening on the MDS.
>  
> what would happen if the recovery is aborted through lctl?
>  
> thanks,
> javed
> _______________________________________________
> Lustre-discuss mailing list
> [email protected]
> http://lists.lustre.org/mailman/listinfo/lustre-discuss


_______________________________________________
Lustre-discuss mailing list
[email protected]
http://lists.lustre.org/mailman/listinfo/lustre-discuss

Reply via email to