We have two nodes that have two drbd resources for two different applications
on a pair of servers managed by Pacemaker. All looks to work fine when the
primary node is put into standby or power cycled. Meaning that the drbd Primary
gets moved to the new active node and the applications continue to run as
expected. I have an issue when I pull the Ethernet out of the primary node and
let it sit there for about a half hour. When I unplug it the Primary gets moved
as expected and the applications continue to work. However, when I plug the
Ethernet back into the system, both nodes go into a standalone state.
Node 1:
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
srcversion: F97798065516C94BE0F27DC
m:res cs ro ds p mounted fstype
0:r0 StandAlone Primary/Unknown UpToDate/DUnknown r----- ext4
1:r1 StandAlone Primary/Unknown UpToDate/DUnknown r----- ext4
Node 2:
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
srcversion: F97798065516C94BE0F27DC
m:res cs ro ds p mounted fstype
0:r0 StandAlone Secondary/Unknown UpToDate/DUnknown r-----
1:r1 StandAlone Secondary/Unknown UpToDate/DUnknown r-----
As you can see one knows it is Primary and that is what the applications
continue to run on. The second node knows it should be Secondary. All I do to
resolve this is connect the resources on each node with the Secondary having
the -discard-my-data option.
Is there a way to have the connects done automatically. This looks to be a type
of "split brain' and I do have that configured in the global.common.conf:
global {
usage-count no;
# minor-count dialog-refresh disable-ip-verification
}
common {
handlers {
# These are EXAMPLE handlers only.
# They may have severe implications,
# like hard resetting the node under certain circumstances.
# Be careful when chosing your poison.
# pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot
-f";
# pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh;
/usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot
-f";
# local-io-error "/usr/lib/drbd/notify-io-error.sh;
/usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt
-f";
# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
# before-resync-target
"/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
# after-resync-target
/usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
}
startup {
# wfc-timeout degr-wfc-timeout outdated-wfc-timeout
wait-after-sb
}
options {
# cpu-mask on-no-data-accessible
}
disk {
# size max-bio-bvecs on-io-error fencing disk-barrier
disk-flushes
# disk-drain md-flushes resync-rate resync-after al-extents
# c-plan-ahead c-delay-target c-fill-target c-max-rate
# c-min-rate disk-timeout
}
net {
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
# after-sb-2pri consensus;
after-sb-2pri disconnect;
# protocol timeout max-epoch-size max-buffers unplug-watermark
# connect-int ping-int sndbuf-size rcvbuf-size ko-count
# allow-two-primaries cram-hmac-alg shared-secret after-sb-0pri
# after-sb-1pri after-sb-2pri always-asbp rr-conflict
# ping-timeout data-integrity-alg tcp-cork on-congestion
# congestion-fill congestion-extents csums-alg verify-alg
# use-rle
}
}
The following are also the resource files:
r0.res:
resource r0 {
on Node1 {
volume 0 {
device /dev/drbd0;
disk /dev/ Node1-vg/AOS;
flexible-meta-disk internal;
}
address 10.0.6.221:7788;
}
on Node2 {
volume 0 {
device /dev/drbd0;
disk /dev/ Node2-vg/AOS;
flexible-meta-disk internal;
}
address 10.0.6.222:7788;
}
}
r1.res:
resource r1 {
on Node1 {
volume 0 {
device /dev/drbd1;
disk /dev/ Node1-vg/Controller;
flexible-meta-disk internal;
}
address 10.0.6.221:7789;
}
on Node2 {
volume 0 {
device /dev/drbd1;
disk /dev/ Node2-vg/Controller;
flexible-meta-disk internal;
}
address 10.0.6.222:7789;
}
}
I am not sure if this is possible, but I figured I would ask.
Thanks,
Keith
[cid:fm-logo.jpg]<http://www.fibermountain.com>
[cid:2015FMI.jpg]
Keith Ouellette
[email protected]
700 West Johnson Avenue
Cheshire, CT06410
www.fibermountain.com
[cid:redline.jpg]
P. (203) 806-4046
C. (860) 810-4877
F. (845) 358-7882
Disclaimer: The information contained in this communication is confidential,
may be privileged and is intended for the exclusive use of the above named
addressee(s). If you are not the intended recipient(s), you are expressly
prohibited from copying, distributing, disseminating, or in any other way using
any information contained within this communication. If you have received this
communication in error, please contact the sender by telephone or by response
via mail. We have taken precautions to minimize the risk of transmitting
software viruses, but we advise you to carry out your own virus checks on this
message, as well as any attachments. We cannot accept liability for any loss or
damage caused by software viruses.
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user