In addition to the replicas, the node quorum, and a tiebreaker disk, you need to follow recommended practice of third site with a very small node for node quorum and file system quorum.
What I think is missing here is what is often called a third site consisting of something as simple as a Laptop. You can build two node clusters with a local tiebreaker disk to avoid split brain. But there is also a need for a third failure group that holds a copy of the File System Descriptors typically, instantiated as descOnly in the NSD, to maintain file system quorum in addtion to node quorum. Often this is done at a third location so if one location goes down quorum on nodes and file system are maintained as well as a tiebreaker supported. The third site could be an NSD internal to the laptop, not on a SAN, of about 128MB in size.
There is a good discussion on Synchronous Mirroring in the Spectrum Scale advanced adminstrstion guide. In the command "mmcrnsd" the usage is descOnly and states:
descOnly
Indicates that the disk contains no data and no file metadata. Such a disk is used solely to keep a copy of the file system descriptor, and can be used as a third failure group in certain disaster recovery configurations. For more information, see the help topic "Synchronous mirroring utilizing GPFS replication" in IBM Spectrum Scale: Advanced Administration Guide. See page 85 "Establishing disaster recovery for a cluster. Ignore the discussion on PPRC and Flashcopy as these are not related to your question.
A pdf of all the manuals is at https://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html
There is also a description of the File System Descriptor Quorum in the Concepts, Planning, and Installation Guides page 52 (chapter 2 in general is worth reading for your area of questioning). If one site was failure group 1 and the other failure group 2, then a replication of 2 would force the synchronous mirror to occur across your two sites. But you could loose quorum on the file system and therefore while the cluster would be running, the file systems may not. It states
File system descriptor quorum
A GPFS structure called the file system descriptor is initially written to every disk in the file system and is replicated on a subset of the disks as changes to the file system occur, such as the adding or deleting of disks. Based on the number of failure groups and disks, GPFS creates one to five replicas of the descriptor:
- If there are at least five different failure groups, five replicas are created.
- If there are at least three different disks, three replicas are created.
- If there are only one or two disks, a replica is created on each disk.
Once it decides how many replicas to create, GPFS picks disks to hold the replicas, so that all replicas are in different failure groups, if possible, to reduce the risk of multiple failures. In picking replica locations, the current state of the disks is taken into account. Stopped or suspended disks are avoided. Similarly, when a failed disk is brought back online, GPFS might rebalance the file system descriptors in order to assure reliability across the failure groups. The disks used to hold the file system descriptor replicas can be seen by running the mmlsdisk fsname -L command and looking for the string desc in the Remarks column.
GPFS requires that a majority of the replicas on the subset of disks remain available to sustain file system operations:
- If there are at least five different replicas, GPFS can tolerate a loss of two of the five replicas.
- If there are at least three replicas, GPFS can tolerate a loss of one of the three replicas.
- If there are fewer than three replicas, a loss of one replica might make the descriptor inaccessible.
The loss of all disks in a disk failure group might cause a majority of file systems descriptors to become unavailable and inhibit further file system operations. For example, if your file system is backed up by three or more disks that are assigned to two separate disk failure groups, one of the failure groups will be assigned two of the file system descriptor replicas, while the other failure group will be assigned only one replica. If all of the disks in the disk failure group that contains the two replicas were to become unavailable, the file system would also become unavailable. To avoid this particular scenario, you might want to introduce a third disk failure group consisting of a single disk that is designated as a descOnly disk. This disk would exist solely to contain a replica of the file system descriptor (that is, it would not contain any file system metadata or data). This disk should be at least 128MB in size.
For more information on this topic, see “Network Shared Disk (NSD) creation considerations” on page 50 and the topic "Establishing disaster recovery for your GPFS cluster" in the IBM Spectrum Scale: Advanced Administration Guide.
Edward L. Boyd ( Ed )
Client Technical Specialist, Spectrum Scale (GPFS) / Spectrum Compute (Platform) / IBM Cloud Storage (Cleversafe) / Elastic Storage Server
IBM Systems, Software Defined Storage Solutions (SDSS)
US Federal
407-221-9544 Cell / Text Msg
[email protected] email
[email protected] wrote: -----
From: [email protected]
Sent by: [email protected]
Date: 07/21/2016 09:13AM
Subject: gpfsug-discuss Digest, Vol 54, Issue 48
[email protected]
To subscribe or unsubscribe via the World Wide Web, visit
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
or, via email, send a message with subject or body 'help' to
[email protected]
You can reach the person managing the list at
[email protected]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of gpfsug-discuss digest..."
Today's Topics:
1. Re: NDS in Two Site scenario ([email protected])
----------------------------------------------------------------------
Message: 1
Date: Thu, 21 Jul 2016 13:12:58 +0000
From: "[email protected]" <[email protected]>
To: gpfsug main discussion list <[email protected]>
Subject: Re: [gpfsug-discuss] NDS in Two Site scenario
Message-ID: <[email protected]>
Content-Type: text/plain; charset="utf-8"
Thanks Vic&Simon, I?m totally cool with ?it depends? the solution guidance is to achieve a Highly Available FS. And there is Dark Fibre between the two locations. FileNet is the application and they want two things. Ability to write in both locations (maybe close to at the same time not necessarily the same files though) and protect against any site failure. So in my mind my Scenario 1 would work as long as I had copies=2 and restripe are acceptable. Is my Scenario 2 I would still have to restripe if the SAN in site 1 went down.
I?m looking for the simplest approach that provides the greatest availability.
From: <[email protected]> on behalf of "Simon Thompson (Research Computing - IT Services)" <[email protected]>
Reply-To: gpfsug main discussion list <[email protected]>
Date: Thursday, July 21, 2016 at 8:02 AM
To: gpfsug main discussion list <[email protected]>
Subject: Re: [gpfsug-discuss] NDS in Two Site scenario
It depends.
What are you protecting against?
Either will work depending on your acceptable failure modes. I'm assuming here that you are using copies=2 to replicate the data, and that the NSD devices have different failure groups per site.
In the second example, if you were to lose the NSD servers in Site 1, but not the SAN, you would continue to have 2 copies of data written as the NSD servers in Site 2 could write to the SAN in Site 1.
In the first example you would need to rest ripe the file-system when brining the Site 1 back online to ensure data is replicated.\
Simon
From: <[email protected]<mailto:[email protected]>> on behalf of "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>
Date: Thursday, 21 July 2016 at 13:45
To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>
Subject: Re: [gpfsug-discuss] NDS in Two Site scenario
This is where my confusion sits. So if I have two sites, and two NDS Nodes per site with 1 NSD (to keep it simple), do I just present the physical LUN in Site1 to Site1 NDS Nodes and physical LUN in Site2 to Site2 NSD Nodes? Or is it that I present physical LUN in Site1 to all 4 NDS Nodes and the same at Site2? (Assuming SAN and not direct attached in this case). I know I?m being persistent but this for some reason confuses me.
Site1
NSD Node1
---NSD1 ---Physical LUN1 from SAN1
NSD Node2
Site2
NSD Node3
---NSD2 ?Physical LUN2 from SAN2
NSD Node4
Or
Site1
NSD Node1
----NSD1 ?Physical LUN1 from SAN1
----NSD2 ?Physical LUN2 from SAN2
NSD Node2
Site 2
NSD Node3
---NSD2 ? Physical LUN2 from SAN2
---NSD1 --Physical LUN1 from SAN1
NSD Node4
Site 3
Node5 Quorum
From: <[email protected]<mailto:[email protected]>> on behalf of Ken Hill <[email protected]<mailto:[email protected]>>
Reply-To: gpfsug main discussion list <[email protected]<mailto:[email protected]>>
Date: Wednesday, July 20, 2016 at 7:02 PM
To: gpfsug main discussion list <[email protected]<mailto:[email protected]>>
Subject: Re: [gpfsug-discuss] NDS in Two Site scenario
Yes - it is a cluster.
The sites should NOT be further than a MAN - or Campus network. If you're looking to do this over a large distance - it would be best to choose another GPFS solution (Multi-Cluster, AFM, etc).
Regards,
Ken Hill
Technical Sales Specialist | Software Defined Solution Sales
IBM Systems
________________________________
Phone:1-540-207-7270
E-mail: [email protected]<mailto:[email protected]>
[cid:[email protected]]<http://www.ibm.com/us-en/> [cid:[email protected]] <http://www-03.ibm.com/systems/platformcomputing/products/lsf/> [cid:[email protected]] <http://www-03.ibm.com/systems/platformcomputing/products/high-performance-services/index.html> [cid:[email protected]] <http://www-03.ibm.com/systems/platformcomputing/products/symphony/index.html> [cid:[email protected]] <http://www-03.ibm.com/systems/storage/spectrum/> [cid:[email protected]] <http://www-01.ibm.com/software/tivoli/csi/cloud-storage/> [cid:[email protected]] <http://www-01.ibm.com/software/tivoli/csi/backup-recovery/> [cid:[email protected]] <http://www-03.ibm.com/systems/storage/tape/ltfs/index.html> [cid:[email protected]] <http://www-03.ibm.com/systems/storage/spectrum/> [cid:[email protected]] <http://www-03.ibm.com/systems/storage/spectrum/scale/>
[cid:[email protected]] <https://www.ibm.com/marketplace/cloud/object-storage/us/en-us>
2300 Dulles Station Blvd
Herndon, VA 20171-6133
United States
From: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>
To: gpfsug main discussion list <[email protected]<mailto:[email protected]>>
Date: 07/20/2016 07:33 PM
Subject: Re: [gpfsug-discuss] NDS in Two Site scenario
Sent by: [email protected]<mailto:[email protected]>
________________________________
So in this scenario Ken, can server3 see any disks in site1?
From: <[email protected]<mailto:[email protected]>> on behalf of Ken Hill <[email protected]<mailto:[email protected]>>
Reply-To: gpfsug main discussion list <[email protected]<mailto:[email protected]>>
Date: Wednesday, July 20, 2016 at 4:15 PM
To: gpfsug main discussion list <[email protected]<mailto:[email protected]>>
Subject: Re: [gpfsug-discuss] NDS in Two Site scenario
Site1 Site2
Server1 (quorum 1) Server3 (quorum 2)
Server2 Server4
SiteX
Server5 (quorum 3)
You need to set up another site (or server) that is at least power isolated (if not completely infrastructure isolated) from Site1 or Site2. You would then set up a quorum node at that site | location. This insures you can still access your data even if one of your sites go down.
You can further isolate failure by increasing quorum (odd numbers).
The way quorum works is: The majority of the quorum nodes need to be up to survive an outage.
- With 3 quorum nodes you can have 1 quorum node failures and continue filesystem operations.
- With 5 quorum nodes you can have 2 quorum node failures and continue filesystem operations.
- With 7 quorum nodes you can have 3 quorum node failures and continue filesystem operations.
- etc
Please see http://www.ibm.com/support/knowledgecenter/en/STXKQY_4.2.0/ibmspectrumscale42_content.html?view=kcfor more information about quorum and tiebreaker disks.
Ken Hill
Technical Sales Specialist | Software Defined Solution Sales
IBM Systems
________________________________
Phone:1-540-207-7270
E-mail: [email protected]<mailto:[email protected]>
[cid:[email protected]]<http://www.ibm.com/us-en/> [cid:[email protected]] <http://www-03.ibm.com/systems/platformcomputing/products/lsf/> [cid:[email protected]] <http://www-03.ibm.com/systems/platformcomputing/products/high-performance-services/index.html> [cid:[email protected]] <http://www-03.ibm.com/systems/platformcomputing/products/symphony/index.html> [cid:[email protected]] <http://www-03.ibm.com/systems/storage/spectrum/> [cid:[email protected]] <http://www-01.ibm.com/software/tivoli/csi/cloud-storage/> [cid:[email protected]] <http://www-01.ibm.com/software/tivoli/csi/backup-recovery/> [cid:[email protected]] <http://www-03.ibm.com/systems/storage/tape/ltfs/index.html> [cid:[email protected]] <http://www-03.ibm.com/systems/storage/spectrum/> [cid:[email protected]] <http://www-03.ibm.com/systems/storage/spectrum/scale/>
[cid:[email protected]] <https://www.ibm.com/marketplace/cloud/object-storage/us/en-us>
2300 Dulles Station Blvd
Herndon, VA 20171-6133
United States
From: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>>
To: gpfsug main discussion list <[email protected]<mailto:[email protected]>>
Date: 07/20/2016 04:47 PM
Subject: [gpfsug-discuss] NDS in Two Site scenario
Sent by: [email protected]<mailto:[email protected]>
________________________________
For some reason this concept is a round peg that doesn?t fit the square hole inside my brain. Can someone please explain the best practice to setting up two sites same cluster? I get that I would likely have two NDS nodes in site 1 and two NDS nodes in site two. What I don?t understand are the failure scenarios and what would happen if I lose one or worse a whole site goes down. Do I solve this by having scale replication set to 2 for all my files? I mean a single site I think I get it?s when there are two datacenters and I don?t want two clusters typically.
Mark R. Bush| Solutions Architect
Mobile: 210.237.8415 | [email protected]<mailto:[email protected]>
Sirius Computer Solutions | www.siriuscom.com<http://www.siriuscom.com/>
10100 Reunion Place, Suite 500, San Antonio, TX 78216
This message (including any attachments) is intended only for the use of the individual or entity to which it is addressed and may contain information that is non-public, proprietary, privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this communication is strictly prohibited. This message may be viewed by parties at Sirius Computer Solutions other than those named in the message header. This message does not contain an official representation of Sirius Computer Solutions. If you have received this communication in error, notify Sirius Computer Solutions immediately and (i) destroy this message if a facsimile or (ii) delete this message immediately if this is an electronic communication. Thank you.
Sirius Computer Solutions<http://www.siriuscom.com/>_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 1622 bytes
Desc: image001.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.png
Type: image/png
Size: 1598 bytes
Desc: image002.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image003.png
Type: image/png
Size: 1073 bytes
Desc: image003.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.png
Type: image/png
Size: 980 bytes
Desc: image004.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0003.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image005.png
Type: image/png
Size: 1565 bytes
Desc: image005.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0004.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image006.png
Type: image/png
Size: 1314 bytes
Desc: image006.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0005.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image007.png
Type: image/png
Size: 1169 bytes
Desc: image007.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0006.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image008.png
Type: image/png
Size: 1427 bytes
Desc: image008.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0007.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image009.png
Type: image/png
Size: 1370 bytes
Desc: image009.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0008.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image010.png
Type: image/png
Size: 1245 bytes
Desc: image010.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0009.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image011.png
Type: image/png
Size: 4455 bytes
Desc: image011.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0010.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image012.png
Type: image/png
Size: 1623 bytes
Desc: image012.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0011.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image013.png
Type: image/png
Size: 1599 bytes
Desc: image013.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0012.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image014.png
Type: image/png
Size: 1074 bytes
Desc: image014.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0013.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image015.png
Type: image/png
Size: 981 bytes
Desc: image015.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0014.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image016.png
Type: image/png
Size: 1566 bytes
Desc: image016.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0015.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image017.png
Type: image/png
Size: 1315 bytes
Desc: image017.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0016.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image018.png
Type: image/png
Size: 1170 bytes
Desc: image018.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0017.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image019.png
Type: image/png
Size: 1428 bytes
Desc: image019.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0018.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image020.png
Type: image/png
Size: 1371 bytes
Desc: image020.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0019.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image021.png
Type: image/png
Size: 1246 bytes
Desc: image021.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0020.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image022.png
Type: image/png
Size: 4456 bytes
Desc: image022.png
URL: <http://gpfsug.org/pipermail/gpfsug-discuss/attachments/20160721/734c98ee/attachment-0021.png>
------------------------------
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
End of gpfsug-discuss Digest, Vol 54, Issue 48
**********************************************
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
