Yeah that did it, it was set to the default value of “no”.

What exactly does “no” mean as opposed to “yes”? The docs
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adm_tuningguide.htm

Aren’t very forthcoming on this …

(note it looks like we also have to set this in multi-cluster environments in 
client clusters as well)

Simon

From: "robert.oester...@nuance.com" <robert.oester...@nuance.com>
Date: Friday, 13 April 2018 at 21:17
To: "gpfsug-discuss@spectrumscale.org" <gpfsug-discuss@spectrumscale.org>
Cc: "Simon Thompson (IT Research Support)" <s.j.thomp...@bham.ac.uk>
Subject: Re: [Replicated and non replicated data

Add:

unmountOnDiskFail=meta

To your config. You can add it with “-I” to have it take effect w/o reboot.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-boun...@spectrumscale.org> on behalf of "Simon Thompson 
(IT Research Support)" <s.j.thomp...@bham.ac.uk>
Reply-To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org>
Date: Friday, April 13, 2018 at 3:06 PM
To: "gpfsug-discuss@spectrumscale.org" <gpfsug-discuss@spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] Replicated and non replicated data

I have a question about file-systems with replicated an non replicated data.

We have a file-system where metadata is set to copies=2 and data copies=2, we 
then use a placement policy to selectively replicate some data only once based 
on file-set. We also place the non-replicated data into a specific pool 
(6tnlsas) to ensure we know where it is placed.

My understanding was that in doing this, if we took the disks with the non 
replicated data offline, we’d still have the FS available for users as the 
metadata is replicated. Sure accessing a non-replicated data file would give an 
IO error, but the rest of the FS should be up.

We had a situation today where we wanted to take stg01 offline today, so tried 
using mmchdisk stop -d …. Once we got to about disk stg01-01_12_12, GPFS would 
refuse to stop any more disks and complain about too many disks, similarly if 
we shutdown the NSD servers hosting the disks, the filesystem would have an 
SGPanic and force unmount.

First, am I correct in thinking that a FS with non-replicated data, but 
replicated metadata should still be accessible (not the non-replicated data) 
when the LUNS hosting it are down?

If so, any suggestions why my FS is panic-ing when we take down the one set of 
disks?

I thought at first we had some non-replicated metadata, tried a mmrestripefs -R 
–metadata-only to force it to ensure 2 replicas, but this didn’t help.

Running 5.0.0.2 on the NSD server nodes.

(First time we went round this we didn’t have a FS descriptor disk, but you can 
see below that we added this)

Thanks

Simon


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to