Dominik Klein writes:
> I read your email on the pacemaker list and from what you've shared and
> explained, i cannot spot find a configuration issue. It should just work
> like that (and does work like that for me).
i did more experiments and noticed that migration-threshold=N doesn't
work as i thought it would. i thought that if starting of a resource
fails N times, the group of the resource will migrate to the other node.
what happens instead is that if N is 3, for example, and i stop the
resource (e.g., mysql server) three times, pacemaker will start it two
times on the original node and on third start migrates the resources to
the other one even if start worked fine each time.
is there a means to achieve the migration only when start failed N
times?
> Maybe post your entire configuration, preferrably a hb_report
> archive.
i think i had a bug in my crm during the earlier tests. i had set
migration-threshold on an individual resource (mysql-server)
crm_resource --meta --resource mysql-server --set-parameter migration-threshold
--property-value 3
instead of the whole group. now i have
group mysql-server-group fs0 virtual-ip mysql-server \
meta migration-threshold="3"
and migration of the resources takes place after third start. complete
config is below.
the real problem is that start of mysql server by pacemaker stops
altogether after a few manual stops (/etc/init.d/mysql stop).
here is an example. i stop mysql and all other resources are started on
the other node except mysql server:
crmd[9940]: 2009/03/23_19:33:23 info: send_direct_ack: ACK'ing resource op
drbd0:0_monitor_60000 from 5:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc:
lrm_invoke-lrmd-1237829603-11
crmd[9940]: 2009/03/23_19:33:23 info: do_lrm_rsc_op: Performing
key=59:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 )
lrmd[9937]: 2009/03/23_19:33:23 info: rsc:drbd0:0: notify
crmd[9940]: 2009/03/23_19:33:23 info: do_lrm_rsc_op: Performing
key=61:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 )
crmd[9940]: 2009/03/23_19:33:24 info: process_lrm_event: LRM operation
drbd0:0_monitor_60000 (call=31, rc=-2, cib-update=0, confirmed=true) Cancelled
unknown exec error
lrmd[9937]: 2009/03/23_19:33:24 info: rsc:drbd0:0: notify
crmd[9940]: 2009/03/23_19:33:24 info: process_lrm_event: LRM operation
drbd0:0_notify_0 (call=32, rc=0, cib-update=49, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:24 info: process_lrm_event: LRM operation
drbd0:0_notify_0 (call=33, rc=0, cib-update=50, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:26 info: do_lrm_rsc_op: Performing
key=62:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 )
lrmd[9937]: 2009/03/23_19:33:26 info: rsc:drbd0:0: notify
crmd[9940]: 2009/03/23_19:33:26 info: do_lrm_rsc_op: Performing
key=13:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_promote_0 )
crm_master[13804]: 2009/03/23_19:33:26 info: Invoked: /usr/sbin/crm_master -l
reboot -v 75
lrmd[9937]: 2009/03/23_19:33:27 info: RA output: (drbd0:0:notify:stdout) 0
Trying master-drbd0:0=75 update via attrd
lrmd[9937]: 2009/03/23_19:33:27 info: rsc:drbd0:0: promote
crmd[9940]: 2009/03/23_19:33:27 info: process_lrm_event: LRM operation
drbd0:0_notify_0 (call=34, rc=0, cib-update=51, confirmed=true) complete ok
lrmd[9937]: 2009/03/23_19:33:27 info: RA output: (drbd0:0:promote:stdout)
drbd[13811]: 2009/03/23_19:33:27 INFO: drbd0 promote: primary succeeded
crmd[9940]: 2009/03/23_19:33:27 info: process_lrm_event: LRM operation
drbd0:0_promote_0 (call=35, rc=0, cib-update=52, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:29 info: do_lrm_rsc_op: Performing
key=60:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 )
lrmd[9937]: 2009/03/23_19:33:29 info: rsc:drbd0:0: notify
crm_master[13983]: 2009/03/23_19:33:29 info: Invoked: /usr/sbin/crm_master -l
reboot -v 75
lrmd[9937]: 2009/03/23_19:33:29 info: RA output: (drbd0:0:notify:stdout) 0
Trying master-drbd0:0=75 update via attrd
crmd[9940]: 2009/03/23_19:33:29 info: process_lrm_event: LRM operation
drbd0:0_notify_0 (call=36, rc=0, cib-update=53, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:31 info: do_lrm_rsc_op: Performing
key=44:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=fs0_start_0 )
lrmd[9937]: 2009/03/23_19:33:31 info: rsc:fs0: start
crmd[9940]: 2009/03/23_19:33:31 info: do_lrm_rsc_op: Performing
key=14:8:8:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_monitor_59000 )
Filesystem[13990]: 2009/03/23_19:33:31 INFO: Running start for /dev/drbd0
on /var/lib/mysql
crmd[9940]: 2009/03/23_19:33:31 info: process_lrm_event: LRM operation
drbd0:0_monitor_59000 (call=38, rc=8, cib-update=54, confirmed=false) complete
master
crmd[9940]: 2009/03/23_19:33:31 info: process_lrm_event: LRM operation
fs0_start_0 (call=37, rc=0, cib-update=55, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:33 info: do_lrm_rsc_op: Performing
key=46:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=virtual-ip_start_0 )
lrmd[9937]: 2009/03/23_19:33:33 info: rsc:virtual-ip: start
IPaddr2[14090]: 2009/03/23_19:33:33 INFO: ip -f inet addr add 192.98.102.10/24
brd 192.98.102.255 dev eth1
IPaddr2[14090]: 2009/03/23_19:33:33 INFO: ip link set eth1 up
IPaddr2[14090]: 2009/03/23_19:33:33 INFO: /usr/lib/heartbeat/send_arp -i 200 -r
5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.98.102.10 eth1
192.98.102.10 auto not_used not_used
crmd[9940]: 2009/03/23_19:33:33 info: process_lrm_event: LRM operation
virtual-ip_start_0 (call=39, rc=0, cib-update=56, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:35 info: do_lrm_rsc_op: Performing
key=47:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=virtual-ip_monitor_21000 )
crmd[9940]: 2009/03/23_19:33:35 info: process_lrm_event: LRM operation
virtual-ip_monitor_21000 (call=40, rc=0, cib-update=57, confirmed=false)
complete ok
as you see, there is nothing in the log about mysql server. looks like
pacemaker has completely ignored it. crm_mon -1 shows:
============
Last updated: Mon Mar 23 19:39:28 2009
Current DC: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325)
Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160
2 Nodes configured.
2 Resources configured.
============
Node: lenny1 (8df8447f-6ecf-41a7-a131-c89fd59a120d): online
Node: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325): online
Master/Slave Set: ms-drbd0
drbd0:0 (ocf::heartbeat:drbd): Master lenny1
drbd0:1 (ocf::heartbeat:drbd): Slave lenny2
Resource Group: mysql-server-group
fs0 (ocf::heartbeat:Filesystem): Started lenny1
virtual-ip (ocf::heartbeat:IPaddr2): Started lenny1
mysql-server (lsb:mysql): Stopped
Failed actions:
mysql-server_monitor_10000 (node=lenny2, call=27, rc=7, status=complete):
not running
mysql-server_monitor_10000 (node=lenny1, call=22, rc=7, status=complete):
not running
-- juha
-------------------------------------------------------------------------
node $id="8df8447f-6ecf-41a7-a131-c89fd59a120d" lenny1
node $id="f13aff7b-6c94-43ac-9a24-b118e62d5325" lenny2
primitive drbd0 ocf:heartbeat:drbd \
params drbd_resource="drbd0" \
op monitor interval="59s" role="Master" timeout="30s" \
op monitor interval="60s" role="Slave" timeout="30s"
primitive fs0 ocf:heartbeat:Filesystem \
params fstype="ext3" directory="/var/lib/mysql" device="/dev/drbd0" \
meta target-role="Started"
primitive virtual-ip ocf:heartbeat:IPaddr2 \
params ip="192.98.102.10" broadcast="192.98.102.255" nic="eth1"
cidr_netmask="24" \
op monitor interval="21s" timeout="5s"
primitive mysql-server lsb:mysql \
op monitor interval="10s" timeout="30s" start-delay="10s"
group mysql-server-group fs0 virtual-ip mysql-server \
meta migration-threshold="3"
ms ms-drbd0 drbd0 \
meta clone-max="2" notify="true" globally-unique="false"
target-role="Started"
location ms-drbd0-master-on-lenny1 ms-drbd0 \
rule $id="ms-drbd0-master-on-lenny1-rule" $role="master" 100: #uname eq
lenny1
colocation mysql-server-group-on-ms-drbd0 inf: mysql-server-group
ms-drbd0:Master
order ms-drbd0-before-mysql-server-group inf: ms-drbd0:promote
mysql-server-group:start
property $id="cib-bootstrap-options" \
dc-version="1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160" \
default-resource-stickiness="1"
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems