[DRBD-user] Stuck in Standalone

Keith Ouellette Thu, 16 Apr 2015 08:38:01 -0700

We have two nodes that have two drbd resources for two different applications 
on a pair of servers managed by Pacemaker. All looks to work fine when the 
primary node is put into standby or power cycled. Meaning that the drbd Primary 
gets moved to the new active node and the applications continue to run as 
expected. I have an issue when I pull the Ethernet out of the primary node and 
let it sit there for about a half hour. When I unplug it the Primary gets moved 
as expected and the applications continue to work. However, when I plug the 
Ethernet back into the system, both nodes go into a standalone state.


Node 1:

drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
srcversion: F97798065516C94BE0F27DC
m:res  cs          ro               ds                 p       mounted  fstype
0:r0   StandAlone  Primary/Unknown  UpToDate/DUnknown  r-----  ext4
1:r1   StandAlone  Primary/Unknown  UpToDate/DUnknown  r-----  ext4

Node 2:

drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
srcversion: F97798065516C94BE0F27DC
m:res  cs          ro                 ds                 p       mounted  fstype
0:r0   StandAlone  Secondary/Unknown  UpToDate/DUnknown  r-----
1:r1   StandAlone  Secondary/Unknown  UpToDate/DUnknown  r-----

As you can see one knows it is Primary and that is what the applications 
continue to run on. The second node knows it should be Secondary. All I do to 
resolve this is connect the resources on each node with the Secondary having 
the -discard-my-data option.

Is there a way to have the connects done automatically. This looks to be a type 
of "split brain' and I do have that configured in the global.common.conf:

global {
        usage-count no;
        # minor-count dialog-refresh disable-ip-verification
}
common {
        handlers {
                # These are EXAMPLE handlers only.
                # They may have severe implications,
                # like hard resetting the node under certain circumstances.
                # Be careful when chosing your poison.
                # pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; 
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot 
-f";
                # pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; 
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot 
-f";
                # local-io-error "/usr/lib/drbd/notify-io-error.sh; 
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt 
-f";
                # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
                split-brain "/usr/lib/drbd/notify-split-brain.sh root";
                # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
                # before-resync-target 
"/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
                # after-resync-target 
/usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
        }
        startup {
                # wfc-timeout degr-wfc-timeout outdated-wfc-timeout 
wait-after-sb
        }
        options {
                # cpu-mask on-no-data-accessible
        }
        disk {
                # size max-bio-bvecs on-io-error fencing disk-barrier 
disk-flushes
                # disk-drain md-flushes resync-rate resync-after al-extents
                # c-plan-ahead c-delay-target c-fill-target c-max-rate
                # c-min-rate disk-timeout
        }
        net {
                after-sb-0pri discard-zero-changes;
                after-sb-1pri discard-secondary;
                # after-sb-2pri consensus;
                after-sb-2pri disconnect;
                # protocol timeout max-epoch-size max-buffers unplug-watermark
                # connect-int ping-int sndbuf-size rcvbuf-size ko-count
                # allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
                # after-sb-1pri after-sb-2pri always-asbp rr-conflict
                # ping-timeout data-integrity-alg tcp-cork on-congestion
                # congestion-fill congestion-extents csums-alg verify-alg
                # use-rle
        }
}

The following are also the resource files:

r0.res:

resource r0 {
        on Node1 {
                volume 0 {
                        device          /dev/drbd0;
                        disk            /dev/ Node1-vg/AOS;
                        flexible-meta-disk      internal;
                }
                address         10.0.6.221:7788;
        }
        on Node2 {
                volume 0 {
                        device          /dev/drbd0;
                        disk            /dev/ Node2-vg/AOS;
                        flexible-meta-disk      internal;
                }
                address         10.0.6.222:7788;
        }
}

r1.res:

resource r1 {
        on Node1 {
                volume 0 {
                        device          /dev/drbd1;
                        disk            /dev/ Node1-vg/Controller;
                        flexible-meta-disk      internal;
                }
                address         10.0.6.221:7789;
        }
        on Node2 {
                volume 0 {
                        device          /dev/drbd1;
                        disk            /dev/ Node2-vg/Controller;
                        flexible-meta-disk      internal;
                }
                address         10.0.6.222:7789;
        }
}

I am not sure if this is possible, but I figured I would ask.

Thanks,
Keith


[cid:fm-logo.jpg]<http://www.fibermountain.com>
[cid:2015FMI.jpg]


Keith Ouellette


[email protected]


700 West Johnson Avenue
Cheshire, CT06410
www.fibermountain.com


[cid:redline.jpg]


P. (203) 806-4046
C. (860) 810-4877
F. (845) 358-7882





Disclaimer: The information contained in this communication is confidential, 
may be privileged and is intended for the exclusive use of the above named 
addressee(s). If you are not the intended recipient(s), you are expressly 
prohibited from copying, distributing, disseminating, or in any other way using 
any information contained within this communication. If you have received this 
communication in error, please contact the sender by telephone or by response 
via mail. We have taken precautions to minimize the risk of transmitting 
software viruses, but we advise you to carry out your own virus checks on this 
message, as well as any attachments. We cannot accept liability for any loss or 
damage caused by software viruses.

_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user

[DRBD-user] Stuck in Standalone

Reply via email to