On Sat, Dec 10, 2016 at 4:35 AM, Aaron Knister <[email protected]> wrote:
> Thanks Eric! > > I have a few follow up questions for you-- > > Do you recall the exact versions of 3.5 and 4.1 your cluster went from/to? > I'm curious to know what version of 4.1 you were at when you ran the > mmchconfig. > I went from 3.5.0-28 to 4.1.0-8 to 4.2.1-1. > > Would you mind sharing any log messages related to the errors you saw when > you ran the mmchconfig? > > Unfortunately I didn't save any actual logs from the update. I did the first cluster in early July so nothing remains. The only note I have is: "On update, after finalizing gpfs 4.1 the quota file format apparently changed and caused a mmrepquota hang/deadlock. Had to shutdown and restart the whole cluster." Sorry to not be very helpful on that front. -Eric > I recently did a rolling upgrade from 3.5 to 4.1 to 4.2 on two different > clusters. Two things: > > Upgrading from 3.5 to 4.1 I did node at a time and then at the end > mmchconfig release=LATEST. Minutes after flipping to latest the cluster > became non-responsive, with node mmfs panics and everything had to be > restarted. Logs indicated it was a quota problem. In 4.1 the quota files > move from externally visible files to internal hidden files. I suspect the > quota file transition can't be done without a cluster restart. When I did > the second cluster I upgraded all nodes and then very quickly stopped and > started the entire cluster, issuing the mmchconfig in the middle. No quota > panic problems on that one. > > Upgrading from 4.1 to 4.2 I did node at a time and then at the end > mmchconfig release=LATEST. No cluster restart. Everything seemed to work > okay. Later, restarting a node I got weird fstab errors on gpfs startup and > using certain commands, notably mmfind, the command would fail with > something like "can't find /dev/uwfs" (our filesystem.) I restarted the > whole cluster and everything began working normally. In this case 4.2 got > rid of /dev/fsname. Just like in the quota case it seems that this > transition can't be seamless. Doing the second cluster I upgraded all nodes > and then again quickly restarted gpfs to avoid the same problem. > > Other than these two quirks, I heartily thank IBM for making a very > complex product with a very easy upgrade procedure. I could imagine many > ways that an upgrade hop of two major versions in two weeks could go very > wrong but the quality of the product and team makes my job very easy. > > -Eric > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
