Hi Patrick, Thanks for the additional input! I'll skip the exiting live upgrade this time then.
Regards, -- Peter Bortas, NSC On Mon, Jul 11, 2016 at 1:39 AM, Patrick Farrell <p...@cray.com> wrote: > Because of the issue highlighted by Andreas - a great number of possible > states when a job is running - Cray does our upgrades with the system quiet. > Live upgrades aren't something we even consider - The potential damage is > too large for the time saved. Especially since the actual *upgrade* usually > doesn't take very long at all, generally speaking. For 2.4 to 2.5, the > 'clean' version is just stop activity to the filesystem, unmount it on > clients, stop it/unmount it server side, install the new Lustre RPMs, and > start it up again. This is relatively quick. > > ________________________________ > From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of > Dilger, Andreas <andreas.dil...@intel.com> > Sent: Sunday, July 10, 2016 5:53:38 PM > To: Peter Bortas > Cc: lustre-discuss@lists.lustre.org > Subject: Re: [lustre-discuss] Is live upgrade of 2.4 to 2.5 unproblematic? > > We typically test 2.x->2.x+1 upgrades, both live and offline, for every > version of Lustre. That said, there are a large number of possible states > that may occur with a running job, so it isn't possible to test everything. > If you are ready to abort the long-running job, then trying the live upgrade > and having to restart if it fails isn't any worse. > > I'd always recommend to make a backup of the MDT, regardless of whether you > are doing an upgrade or not, since it is a lot easier to restore only the > MDT if there are problems than to restore the whole filesystem. > > Cheers, Andreas > >> On Jul 8, 2016, at 09:08, Peter Bortas <bor...@gmail.com> wrote: >> >> I'm upgrading a few ZFS backed filesystems from 2.4.2 to 2.5.3 (both >> from the llnl chaos branch). Clients are already running 2.5EE. It's a >> simple setup with no failover or mirroring of MDSs or OSSs. Originally >> the plan was to do this with the filesystems unmounted on the clients, >> but it looks like it will be hard to get a window to do that any time >> soon. >> >> Are there any known problems just doing an online upgrade 2.4 -> 2.5? >> >> Is the recommended method still OSSs first and MDS last? >> >> (Obviously the clients will lock up if they access these filesystems, >> but locking them up for a fraction of a day beats aborting a 7 day >> compute job.) >> >> Regards, >> -- >> Peter Bortas, NSC >> _______________________________________________ >> lustre-discuss mailing list >> lustre-discuss@lists.lustre.org >> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > _______________________________________________ > lustre-discuss mailing list > lustre-discuss@lists.lustre.org > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org