Re: [gpfsug-discuss] forcibly panic stripegroup everywhere?

Aaron Knister Sun, 22 Jan 2017 17:31:57 -0800

I was afraid someone would ask :)

One possible use would be testing how monitoring reacts to and/orcorrects stale filesystems.

The use in my case is there's an issue we see quite often where afilesystem won't unmount when trying to shut down gpfs. Linux insistsits still busy despite every process being killed on the node just aboutexcept init. It's a real pain because it complicates maintenance,requiring a reboot of some nodes prior to patching for example.

I dug into it and it appears as though when this happens thefilesystem's mnt_count is ridiculously high (300,000+ in one case). I'mtrying to debug it further but I need to actually be able to make thecondition happen a few more times to debug it. A stripegroup panic isn'ta surefire way but it's the only way I've found so far to trigger thisbehavior somewhat on demand.

One way I've found to trigger a mass stripegroup panic is to induce whatI call a "301 error":

loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmountedby the system with return code 301 reason code 0

loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument

and tickle a known race condition between nodes being expelled from thecluster and a manager node joining the cluster. When this happens itseems to cause a mass stripe group panic that's over in a few minutes.The trick there is that it doesn't happen every time I go through theexercise and when it does there's no guarantee the filesystem thatpanics is the one in use. If it's not an fs in use then it doesn't helpme reproduce the error condition. I was trying to use the "mmfsadm testpanic" command to try a more direct approach.


Hope that helps shed some light.

-Aaron

On 1/22/17 8:16 PM, Andrew Beattie wrote:

Out of curiosity -- why would you want to?
Andrew Beattie
Software Defined Storage  - IT Specialist
Phone: 614-2133-7927
E-mail: [email protected] <mailto:[email protected]>



    ----- Original message -----
    From: Aaron Knister <[email protected]>
    Sent by: [email protected]
    To: gpfsug main discussion list <[email protected]>
    Cc:
    Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere?
    Date: Mon, Jan 23, 2017 11:11 AM

    This is going to sound like a ridiculous request, but, is there a way to
    cause a filesystem to panic everywhere in one "swell foop"? I'm assuming
    the answer will come with an appropriate disclaimer of "don't ever do
    this, we don't support it, it might eat your data, summon cthulu, etc.".
    I swear I've seen the fs manager initiate this type of operation before.

    I can seem to do it on a per-node basis with "mmfsadm test panic <fs>
    <error code>" but if I do that over all 1k nodes in my test cluster at
    once it results in about 45 minutes of almost total deadlock while each
    panic is processed by the fs manager.

    -Aaron

    --
    Aaron Knister
    NASA Center for Climate Simulation (Code 606.2)
    Goddard Space Flight Center
    (301) 286-2776
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss






_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] forcibly panic stripegroup everywhere?

Reply via email to