[Linux-HA] confused in two node heartbeat cluster

Mia Lueng Tue, 30 Nov 2010 10:22:19 -0800

I've setup a two-node cluster in sles11 sp1.   I use sbd as the stonith
device. Here is my configuration:
#crm configure show
node hbtest01 \
        attributes standby="off"
node hbtest02 \
        attributes standby="off"
primitive g_app ocf:heartbeat:apache \
        operations $id="g_app-operations" \
        op monitor interval="10" timeout="20s" \
        params configfile="/etc/apache2/httpd.conf"
httpd="/usr/sbin/httpd2-prefork"
primitive ip0 ocf:heartbeat:IPaddr2 \
        operations $id="ip0-operations" \
        op monitor interval="10s" timeout="20s" \
        params ip="192.168.1.28" nic="bond0" cidr_netmask="24" iflabel="0"
primitive r_ping ocf:pacemaker:ping \
        operations $id="r_ping-operations" \
        op monitor interval="10" timeout="60" \
        params dampen="5" multiplier="100" host_list="192.168.2.254"
primitive r_sdb stonith:external/sbd \
        operations $id="r_sdb-operations" \
        op monitor interval="10" timeout="15" on-fail="restart"
start-delay="15" \
        params sbd_device="/dev/sdc1"
group g_apache ip0 g_app \
        meta target-role="Started"
clone PING r_ping \
        meta target-role="Started"
clone SDB r_sdb \
        meta target-role="Started"
location l_conn g_apache \
        rule $id="l_conn-rule" -inf: not_defined pingd or pingd lte 0
property $id="cib-bootstrap-options" \
        dc-version="1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5" \
        cluster-infrastructure="openais" \
        expected-quorum-votes="2" \
        last-lrm-refresh="1291137636"


#crm_mon
============
Last updated: Tue Nov 30 12:54:45 2010
Stack: openais
Current DC: hbtest01 - partition with quorum
Version: 1.1.2-2e096a41a5f9e184a1c1537c82c6da1093698eb5
2 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ hbtest01 hbtest02 ]

 Clone Set: SDB
     Started: [ hbtest01 hbtest02 ]
 Clone Set: PING
     Started: [ hbtest01 hbtest02 ]
 Resource Group: g_apache
     ip0        (ocf::heartbeat:IPaddr2):       Started hbtest02
     g_app      (ocf::heartbeat:apache):        Started hbtest02

When I unplugs all the two nic links of hbtest02(using bonding) , I expect
that the  resource group g_apache should  be taken over to node hbtest01.
But nothing happed. When I re-plug the link , resource g_apache restart on
hbtest02.

And if I set no-quorum-policy to suicide ,  I found that when I unpluged the
nic links of hbtest02, hbtest01&hbtet02 tried to use sdb stonith to fence
each other  and hbtest01 was the lucky one. But the g_apache was still not
be taken over.

Are there any network tiebreaker configuration to determinate which node
should be fenced in this two-node cluster split-brain suitaion? In my case,
since all nic links of hbtest02 is unpluged, the resource should be taken
over by the one with healthy net link.

Thanks
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] confused in two node heartbeat cluster

Reply via email to