Re: [Linux-HA] split brain problem

Lars Ellenberg Tue, 20 Sep 2011 10:15:35 -0700

On Mon, Sep 12, 2011 at 08:19:10AM +0200, Willi Fehler wrote:
> Am 19.07.11 02:35, schrieb Andrew Beekhof:
> > On Sat, Jul 16, 2011 at 7:31 PM, Willi Fehler<[email protected]>  
> > wrote:
> >> Hi,
> >>
> >> I've installed a Pacemaker/OpenAIS/Corosync/DRBD/MySQL Cluster on
> >> CentOS6. (VirtualBox)
> >> If I start both nodes at the same time, I always get a split brain
> > "Split brain" as in, corosync on the two nodes can't talk to one another?
> >
> >> situation, If I start
> >> on node and wait if the node is promoted to DRBD-Master everything is
> >> working. How can I tell Pacemaker which node always become master?
> > a location constraint with role=Master
> >
> >> [root@linsrv001 ~]# crm configure show
> >> node linsrv001.willi-net.local
> >> node linsrv002.willi-net.local
> >> primitive drbd_mysql ocf:linbit:drbd \
> >>      params drbd_resource="r0" \
> >>      op monitor interval="15s"
> >> primitive fs_mysql ocf:heartbeat:Filesystem \
> >>      params device="/dev/drbd/by-res/r0" directory="/var/lib/mysql"
> >> fstype="xfs"
> >> primitive ip_mysql ocf:heartbeat:IPaddr2 \
> >>      params ip="192.168.2.92" nic="eth0"
> >> primitive mysqld lsb:mysql
> >> group mysql fs_mysql ip_mysql mysqld
> >> ms ms_drbd_mysql drbd_mysql \
> >>      meta master-max="1" master-node-max="1" clone-max="2"
> >> clone-node-max="1" notify="true"
> >> location cli-prefer-mysql mysql \
> >>      rule $id="cli-prefer-rule-mysql" inf: #uname eq
> >> linsrv001.willi-net.local
> >> colocation mysql_on_drbd inf: mysql ms_drbd_mysql:Master
> >> order mysql_after_drbd inf: ms_drbd_mysql:promote mysql:start
> >> property $id="cib-bootstrap-options" \
> >>      dc-version="1.1.2-f059ec7ced7a86f18e5490b67ebf4a0b963bccfe" \
> >>      cluster-infrastructure="openais" \
> >>      expected-quorum-votes="2" \
> >>      no-quorum-policy="ignore" \
> >>      stonith-enabled="false"
> >>
> >> My second question is, what happens If one node fails and I have to
> >> setup the hole node again. If I start OpenAIS/Corosync, what happens
> >> with the CIB?(will the cluster information configuration will be
> >> transfered to the node?)
> >>
> >> Regards - Willi
> >>
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> Hi,
> 
> the split brain is triggered by DRBD.


No.

DRBD certainly does not "trigger a split brain".

It _detects_ data divergence, and concludes that this probably is a
result of a previous split brain situation.

Normal operation:
   DRBD: Primary --- Connected --- Secondary

You now probably played with putting one node in standby,
and online it again, or put both nodes in standby then online one again,
something like that.


If a nodes is standby, it does stop all resources.  That is, not even a
drbd secondary as replication target is running.  Which again means that
the data set is not updated.

And pacemaker has no way to know that.

You then placed pacemaker in a situation where it would start
and even promote this data set, which _we_ know is outdated,
but pacemaker and DRBD did not know at that point.

This now caused the data sets to diverge.

Next time DRBD is able to actually do a handshake, it detects this, and
disconnects, logging these messages about split brain and so on.


What you need to do is: configure DRBD to do fencing resource-only; (or
resource-and-stonith;) and e.g. use the fence-peer handler
"/usr/lib/drbd/crm-fence-peer.sh", which is supposed to place a location
constraint on a disconnected Primary to disallow promotion anywhere else.

To make this actually work for real replication link breakage, not only
for "node standby" excercises, you need to make sure that your cluster
has redundant communication paths.

If you have only nodes A and B,
then shutdown A, hard, continue services on B for a while,
then shutdown B as well, and now, while B is still down,
boot A, A has an outdated cib, an outdate DRBD data set,
and no way of knowing this.

Sort of a "deferred" split brain.

DRBD in this situation can know that is _may_ be outdated, and if you
really want to go there, pacemaker and DRBD can be configured to rather
stay offline than go online with *potentially* out-of-date data.
But that's probably too much for this thread already.

This may be an educating read as well:
http://www.mail-archive.com/[email protected]/msg04312.html



> /etc/drbd.conf
> # You can find an example in  /usr/share/doc/drbd.../drbd.conf.example
> 
> include "drbd.d/global_common.conf";
> include "drbd.d/*.res";
> 
> resource r0 {
>    on linsrv001.willi-net.local {
>      address   10.10.10.1:7788;
>      device    /dev/drbd0;
>      disk      /dev/vg00/lv02;
>      meta-disk internal;
>    }
>    on linsrv002.willi-net.local {
>      address   10.10.10.2:7788;
>      device    /dev/drbd0;
>      disk      /dev/vg00/lv02;
>      meta-disk internal;
>    }
> 
> /etc/drbd.d/global_common.conf
> global {
>          usage-count yes;
>          # minor-count dialog-refresh disable-ip-verification
> }
> 
> common {
>          protocol C;
>          handlers {
>                  #pri-on-incon-degr 
> "/usr/lib/drbd/notify-pri-on-incon-degr.sh; 
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; 
> reboot -f";
>                  #pri-lost-after-sb 
> "/usr/lib/drbd/notify-pri-lost-after-sb.sh; 
> /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; 
> reboot -f";
>                  #local-io-error "/usr/lib/drbd/notify-io-error.sh; 
> /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger 
> ; halt -f";
>                  # fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
>                  # split-brain "/usr/lib/drbd/notify-split-brain.sh root";
>                  # out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
>                  # before-resync-target 
> "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
>                  # after-resync-target 
> /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
>          }
> 
>          startup {
>                  # wfc-timeout degr-wfc-timeout outdated-wfc-timeout 
> wait-after-sb
>                  #wfc-timeout 0;
>                  #degr-wfc-timeout 120;
>          }
> 
>          disk {
>                  # on-io-error fencing use-bmbv no-disk-barrier 
> no-disk-flushes
>                  # no-disk-drain no-md-flushes max-bio-bvecs
>          }
> 
>          net {
>                  # sndbuf-size rcvbuf-size timeout connect-int ping-int 
> ping-timeout max-buffers
>                  # max-epoch-size ko-count allow-two-primaries 
> cram-hmac-alg shared-secret
>                  # after-sb-0pri after-sb-1pri after-sb-2pri 
> data-integrity-alg no-tcp-cork
>          }
> 
>          syncer {
>                  # rate after al-extents use-rle cpu-mask verify-alg 
> csums-alg
>                  rate 110M;
>          }
> }
> }
> 
> [root@linsrv001 ~]# grep split /var/log/messages
> Sep 12 06:16:21 linsrv001 kernel: block drbd0: helper command: 
> /sbin/drbdadm initial-split-brain minor-0
> Sep 12 06:16:21 linsrv001 kernel: block drbd0: helper command: 
> /sbin/drbdadm initial-split-brain minor-0 exit code 0 (0x0)
> Sep 12 06:16:21 linsrv001 kernel: block drbd0: helper command: 
> /sbin/drbdadm split-brain minor-0
> Sep 12 06:16:21 linsrv001 kernel: block drbd0: helper command: 
> /sbin/drbdadm split-brain minor-0 exit code 0 (0x0)
> 
> Thank you for your support.
> 
> Regards - Willi
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] split brain problem

Reply via email to