Re: [Linux-HA] Who takes care of the failover?

Michael Schwartzkopff Wed, 25 Nov 2009 00:07:31 -0800

Am Mittwoch, 25. November 2009 09:00:35 schrieb RaSca:
> Hi everybody,
> I'm trying to do some tests with heartbeat and pacemaker (with
> ubuntu-server 9.10, heartbeat 2.99.2+sles11r9-5ubuntu1 and
> pacemaker-heartbeat 1.0.5+hg20090813-0ubuntu4) this is my configuration:
>
> node $id="2ee6e25d-8bd6-42ba-a2c5-6bc98b6f4715" nas-1 \
>       attributes standby="off"
> node $id="4ea6f84c-841a-4272-903c-e14ad4baefe4" nas-2 \
>       attributes standby="off"
> primitive drbd0 ocf:linbit:drbd \
>       params drbd_resource="r0" \
>       op monitor interval="15s" \
>       meta target-role="Started"
> primitive drbd1 ocf:linbit:drbd \
>       params drbd_resource="r1" \
>       op monitor interval="15s" \
>       meta target-role="Started"
> primitive fs_hafs ocf:heartbeat:Filesystem \
>       params device="/dev/drbd0" directory="/hafs" fstype="ext3" \
>       meta target-role="Started"
> primitive fs_mysql ocf:heartbeat:Filesystem \
>       params device="/dev/drbd1" directory="/mysql" fstype="ext3" \
>       meta target-role="Started"
> primitive ip_hafs ocf:heartbeat:IPaddr2 \
>       params ip="192.168.1.80" nic="eth0"
> primitive ip_mysql ocf:heartbeat:IPaddr2 \
>       params ip="192.168.1.81" nic="eth0"
> primitive mysql-server lsb:mysql
> group hafs fs_hafs ip_hafs
> group mysql fs_mysql ip_mysql mysql-server
> ms ms_drbd0 drbd0 \
>       meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> ms ms_drbd1 drbd1 \
>       meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> location cli-prefer-hafs hafs \
>       rule $id="cli-prefer-rule-hafs" inf: #uname eq nas-1
> location cli-prefer-mysql mysql \
>       rule $id="cli-prefer-rule-mysql" inf: #uname eq nas-2
> colocation hafs_on_drbd inf: hafs ms_drbd0:Master
> colocation mysql_on_drbd inf: mysql ms_drbd1:Master
> order hafs_after_drbd inf: ms_drbd0:promote hafs:start
> order mysql_after_drbd inf: ms_drbd1:promote mysql:start
> property $id="cib-bootstrap-options" \
>       dc-version="1.0.5-3840e6b5a305ccb803d29b468556739e75532d56" \
>       cluster-infrastructure="Heartbeat" \
>       stonith-enabled="false" \
>       last-lrm-refresh="1259075046"
>
> the two machines are connected with eth0 on the lan and one to each
> othere with a double cross cable with a bond interface named bond1 that
> works in balance-rr mode.
> This is the heartbeat configuration:
>
> crm on
>
> use_logd no
> debugfile /var/log/ha.debug
> logfile       /var/log/ha.log
> logfacility   local0
>
> keepalive 2
> deadtime 10
> warntime 5
> initdead 15
>
> ucast eth0 192.168.1.79
> ucast eth0 192.168.1.77
> ucast bond1 10.0.0.1
> ucast bond1 10.0.0.2


It doesn't make sens that the node is taking to itself. In a two node cluster 
your need to make the nodes talk to each other, so only one ucast line.

> auto_failback on
>
> node  nas-1
> node  nas-2
>
> ping_group lan gateway pdc

using the ping... option in heartbeat is somehow deprecated. So make the 
network tests a pingd resource.

> Everything works fine (i can move, migrate resources and put a node in
> standby mode) until i force a node to be faulty. I mean something like
> removing the ethernet cable. What I can't understand is why, even if the
> log shows the cable failure, the crm does not move any resource.
> For example, if I'm in this situation:
>
> Master/Slave Set: ms_drbd0
>          Masters: [ nas-1 ]
>          Slaves: [ nas-2 ]
> Resource Group: hafs
>      fs_hafs     (ocf::heartbeat:Filesystem):    Started nas-1
>      ip_hafs     (ocf::heartbeat:IPaddr2):       Started nas-1
> Master/Slave Set: ms_drbd1
>          Masters: [ nas-2 ]
>          Slaves: [ nas-1 ]
> Resource Group: mysql
>      fs_mysql    (ocf::heartbeat:Filesystem):    Started nas-2
>      ip_mysql    (ocf::heartbeat:IPaddr2):       Started nas-2
>      mysql-server        (lsb:mysql):    Started nas-2
>
> and then I remove the ethernet cable of the nas-2 node, from the log i
> see this message:
>
> Nov 24 17:41:45 nas-1 heartbeat: [1489]: info: Link nas-2:eth0 dead.
>
> but any other action is taken by the crm or hertbeat itself...
>
> What's wrong with my thoughts? What am I ignoring?
>
> Thanks for your help.

How should pacemaker get the idea that the network is broken? And how did you 
tell it to react on any failures? Please see:
http://www.clusterlabs.org/wiki/Example_configurations#Failover_IP__Service_in_a_Group_running_on_a_connected_node

-- 
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: [email protected]
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Who takes care of the failover?

Reply via email to