From experience rebooting the storage nodes is fine, the processes accessing them will just hang until restored.  I've done this many times on our cluster with no ill effect.

That said I have not tried it with kernel upgrades or lustre release changes.  That may do something different and unexpected. Some one else on the list may have insight on these.

-Paul Edmon-

On 2/27/19 10:17 AM, Bernd Melchers wrote:
Hi all,
our environment: CentOS-7.6, [email protected], 2 mds, 7 ods, 180 
clients.

Is it possible to reboot the mds and ods server (e.g. for new kernel or
new lustre releases) without affecting running jobs on the client nodes?
The reboot can take up to 15 minutes. Did the clients still wait for
the storage nodes to reappear or will i/o operations get errors?
Is the behaviour of a client influenced by the timeout parameter ( "lctl get_param 
timeout")
or by other parameters?

Mit freundlichen Grüßen
Bernd Melchers

_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to