We noticed some odd behavior recently. I have a customer with a small Scale (with Archive on top) configuration that we recently updated to a dual node configuration. We are using CES and setup a very small 3 nsd shared-root filesystem(gpfssr). We also set up tiebreaker disks and figured it would be ok to use the gpfssr NSDs for this purpose.
When we tried to perform some basic failover testing, both nodes came down. It appears from the logs that when we initiated the node failure (via mmshutdown command...not great, I know) it unmounts and remounts the shared-root filesystem. When it did this, the cluster lost access to the tiebreaker disks, figured it had lost quorum and the other node came down as well. We got around this by changing the tiebreaker disks to our other normal gpfs filesystem. After that failover worked as expected. This is documented nowhere as far as I could find?. I wanted to know if anybody else had experienced this and if this is expected behavior. All is well now and operating as we want so I don't think we'll pursue a support request. Regards, SHAUN ANDERSON STORAGE ARCHITECT O 208.577.2112 M 214.263.7014 NOTICE: This email message and any attachments here to may contain confidential information. Any unauthorized review, use, disclosure, or distribution of such information is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy the original message and all copies of it.
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
