I will say YMMV. I've rebooted storage nodes and have had mixed results where we land into one of three bucktes
1) Codes breeze through and have just been stuck in D state while OSS's reboot 2) RPCs get stuck somewhere and when the OSS comes back I eventually have to force an abort_recovery 3) A code dies by not handling the timeout (not sure if this is due to the code itself or the client improperly handling the timeout) On our current setup with around 1000 clients, 50ish OSS, and 2.5.x vintage lustre servers I would say option 1 is by far the largest percentage (>95). 2 and 3 happen from time to time with likelihood greater than 0. It's always a best practice to take a scheduled outage for a kernel/version upgrade. You never know what oddity your particular setup might encounter. Tim -----Original Message----- From: lustre-discuss <[email protected]> On Behalf Of Paul Edmon Sent: Wednesday, February 27, 2019 7:54 AM To: [email protected] Subject: Re: [lustre-discuss] Rebooting storage nodes while jobs are running? From experience rebooting the storage nodes is fine, the processes accessing them will just hang until restored. I've done this many times on our cluster with no ill effect. That said I have not tried it with kernel upgrades or lustre release changes. That may do something different and unexpected. Some one else on the list may have insight on these. -Paul Edmon- On 2/27/19 10:17 AM, Bernd Melchers wrote: > Hi all, > our environment: CentOS-7.6, [email protected], 2 mds, 7 ods, 180 > clients. > > Is it possible to reboot the mds and ods server (e.g. for new kernel > or new lustre releases) without affecting running jobs on the client nodes? > The reboot can take up to 15 minutes. Did the clients still wait for > the storage nodes to reappear or will i/o operations get errors? > Is the behaviour of a client influenced by the timeout parameter ( > "lctl get_param timeout") or by other parameters? > > Mit freundlichen Grüßen > Bernd Melchers > _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
