Re: [gpfsug-discuss] proper gpfs shutdown when node disappears

J. Eric Wonderley Fri, 03 Feb 2017 05:47:37 -0800

Well we got it into the down state using mmsdrrestore -p to recover stuff
into /var/mmfs/gen to cl004.


Anyhow we ended up unknown for cl004 when it powered off.  Short of
removing node, unknown is the state you get.

Unknown seems stable for a hopefully short outage of cl004.


Thanks

On Thu, Feb 2, 2017 at 4:28 PM, Olaf Weiser <[email protected]> wrote:

> many ways lead to Rome .. and I agree .. mmexpelnode is a nice command ..
> another approach...
> power it off .. (not reachable by ping) .. mmdelnode ... power on/boot ...
> mmaddnode ..
>
>
>
> From:        Aaron Knister <[email protected]>
> To:        <[email protected]>
> Date:        02/02/2017 08:37 PM
> Subject:        Re: [gpfsug-discuss] proper gpfs shutdown when node
> disappears
> Sent by:        [email protected]
> ------------------------------
>
>
>
> You could forcibly expel the node (one of my favorite GPFS commands):
>
> mmexpelnode -N $nodename
>
> and then power it off after the expulsion is complete and then do
>
> mmepelenode -r -N $nodename
>
> which will allow it to join the cluster next time you try and start up
> GPFS on it. You'll still likely have to go through recovery but you'll
> skip the part where GPFS wonders where the node went prior to it
> expelling it.
>
> -Aaron
>
> On 2/2/17 2:28 PM, [email protected] wrote:
> > On Thu, 02 Feb 2017 18:28:22 +0100, "Olaf Weiser" said:
> >
> >> but the /var/mmfs DIR is obviously damaged/empty .. what ever.. that's
> why you
> >> see a message like this..
> >> have you reinstalled that node / any backup/restore thing ?
> >
> > The internal RAID controller died a horrid death and basically took
> > all the OS partitions with it.  So the node was just sort of limping
> along,
> > where the mmfsd process was still coping because it wasn't doing any
> > I/O to the OS partitions - but 'ssh bad-node mmshutdown' wouldn't work
> > because that requires accessing stuff in /var.
> >
> > At that point, it starts getting tempting to just use ipmitool from
> > another node to power the comatose one down - but that often causes
> > a cascade of other issues while things are stuck waiting for timeouts.
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] proper gpfs shutdown when node disappears

Reply via email to