Well we got it into the down state using mmsdrrestore -p to recover stuff into /var/mmfs/gen to cl004.
Anyhow we ended up unknown for cl004 when it powered off. Short of removing node, unknown is the state you get. Unknown seems stable for a hopefully short outage of cl004. Thanks On Thu, Feb 2, 2017 at 4:28 PM, Olaf Weiser <[email protected]> wrote: > many ways lead to Rome .. and I agree .. mmexpelnode is a nice command .. > another approach... > power it off .. (not reachable by ping) .. mmdelnode ... power on/boot ... > mmaddnode .. > > > > From: Aaron Knister <[email protected]> > To: <[email protected]> > Date: 02/02/2017 08:37 PM > Subject: Re: [gpfsug-discuss] proper gpfs shutdown when node > disappears > Sent by: [email protected] > ------------------------------ > > > > You could forcibly expel the node (one of my favorite GPFS commands): > > mmexpelnode -N $nodename > > and then power it off after the expulsion is complete and then do > > mmepelenode -r -N $nodename > > which will allow it to join the cluster next time you try and start up > GPFS on it. You'll still likely have to go through recovery but you'll > skip the part where GPFS wonders where the node went prior to it > expelling it. > > -Aaron > > On 2/2/17 2:28 PM, [email protected] wrote: > > On Thu, 02 Feb 2017 18:28:22 +0100, "Olaf Weiser" said: > > > >> but the /var/mmfs DIR is obviously damaged/empty .. what ever.. that's > why you > >> see a message like this.. > >> have you reinstalled that node / any backup/restore thing ? > > > > The internal RAID controller died a horrid death and basically took > > all the OS partitions with it. So the node was just sort of limping > along, > > where the mmfsd process was still coping because it wasn't doing any > > I/O to the OS partitions - but 'ssh bad-node mmshutdown' wouldn't work > > because that requires accessing stuff in /var. > > > > At that point, it starts getting tempting to just use ipmitool from > > another node to power the comatose one down - but that often causes > > a cascade of other issues while things are stuck waiting for timeouts. > > > > > > _______________________________________________ > > gpfsug-discuss mailing list > > gpfsug-discuss at spectrumscale.org > > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > -- > Aaron Knister > NASA Center for Climate Simulation (Code 606.2) > Goddard Space Flight Center > (301) 286-2776 > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > > > > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss > >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
