Re: [Linux-HA] Beginner questions

Juha Heinanen Mon, 23 Mar 2009 08:44:54 -0700

Dominik Klein writes:

 > I read your email on the pacemaker list and from what you've shared and
 > explained, i cannot spot find a configuration issue. It should just work
 > like that (and does work like that for me).


i did more experiments and noticed that migration-threshold=N doesn't
work as i thought it would.  i thought that if starting of a resource
fails N times, the group of the resource will migrate to the other node.

what happens instead is that if N is 3, for example, and i stop the
resource (e.g., mysql server) three times, pacemaker will start it two
times on the original node and on third start migrates the resources to
the other one even if start worked fine each time.

is there a means to achieve the migration only when start failed N
times?

 > Maybe post your entire configuration, preferrably a hb_report
 > archive.

i think i had a bug in my crm during the earlier tests.  i had set
migration-threshold on an individual resource (mysql-server)

crm_resource --meta --resource mysql-server --set-parameter migration-threshold 
--property-value 3

instead of the whole group.  now i have

group mysql-server-group fs0 virtual-ip mysql-server \
        meta migration-threshold="3"

and migration of the resources takes place after third start.  complete
config is below.

the real problem is that start of mysql server by pacemaker stops
altogether after a few manual stops (/etc/init.d/mysql stop).

here is an example.  i stop mysql and all other resources are started on
the other node except mysql server:

crmd[9940]: 2009/03/23_19:33:23 info: send_direct_ack: ACK'ing resource op 
drbd0:0_monitor_60000 from 5:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc: 
lrm_invoke-lrmd-1237829603-11
crmd[9940]: 2009/03/23_19:33:23 info: do_lrm_rsc_op: Performing 
key=59:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 )
lrmd[9937]: 2009/03/23_19:33:23 info: rsc:drbd0:0: notify
crmd[9940]: 2009/03/23_19:33:23 info: do_lrm_rsc_op: Performing 
key=61:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 )
crmd[9940]: 2009/03/23_19:33:24 info: process_lrm_event: LRM operation 
drbd0:0_monitor_60000 (call=31, rc=-2, cib-update=0, confirmed=true) Cancelled 
unknown exec error
lrmd[9937]: 2009/03/23_19:33:24 info: rsc:drbd0:0: notify
crmd[9940]: 2009/03/23_19:33:24 info: process_lrm_event: LRM operation 
drbd0:0_notify_0 (call=32, rc=0, cib-update=49, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:24 info: process_lrm_event: LRM operation 
drbd0:0_notify_0 (call=33, rc=0, cib-update=50, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:26 info: do_lrm_rsc_op: Performing 
key=62:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 )
lrmd[9937]: 2009/03/23_19:33:26 info: rsc:drbd0:0: notify
crmd[9940]: 2009/03/23_19:33:26 info: do_lrm_rsc_op: Performing 
key=13:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_promote_0 )
crm_master[13804]: 2009/03/23_19:33:26 info: Invoked: /usr/sbin/crm_master -l 
reboot -v 75 
lrmd[9937]: 2009/03/23_19:33:27 info: RA output: (drbd0:0:notify:stdout) 0 
Trying master-drbd0:0=75 update via attrd
lrmd[9937]: 2009/03/23_19:33:27 info: rsc:drbd0:0: promote
crmd[9940]: 2009/03/23_19:33:27 info: process_lrm_event: LRM operation 
drbd0:0_notify_0 (call=34, rc=0, cib-update=51, confirmed=true) complete ok
lrmd[9937]: 2009/03/23_19:33:27 info: RA output: (drbd0:0:promote:stdout) 

drbd[13811]:    2009/03/23_19:33:27 INFO: drbd0 promote: primary succeeded
crmd[9940]: 2009/03/23_19:33:27 info: process_lrm_event: LRM operation 
drbd0:0_promote_0 (call=35, rc=0, cib-update=52, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:29 info: do_lrm_rsc_op: Performing 
key=60:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_notify_0 )
lrmd[9937]: 2009/03/23_19:33:29 info: rsc:drbd0:0: notify
crm_master[13983]: 2009/03/23_19:33:29 info: Invoked: /usr/sbin/crm_master -l 
reboot -v 75 
lrmd[9937]: 2009/03/23_19:33:29 info: RA output: (drbd0:0:notify:stdout) 0 
Trying master-drbd0:0=75 update via attrd
crmd[9940]: 2009/03/23_19:33:29 info: process_lrm_event: LRM operation 
drbd0:0_notify_0 (call=36, rc=0, cib-update=53, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:31 info: do_lrm_rsc_op: Performing 
key=44:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=fs0_start_0 )
lrmd[9937]: 2009/03/23_19:33:31 info: rsc:fs0: start
crmd[9940]: 2009/03/23_19:33:31 info: do_lrm_rsc_op: Performing 
key=14:8:8:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=drbd0:0_monitor_59000 )
Filesystem[13990]:      2009/03/23_19:33:31 INFO: Running start for /dev/drbd0 
on /var/lib/mysql
crmd[9940]: 2009/03/23_19:33:31 info: process_lrm_event: LRM operation 
drbd0:0_monitor_59000 (call=38, rc=8, cib-update=54, confirmed=false) complete 
master
crmd[9940]: 2009/03/23_19:33:31 info: process_lrm_event: LRM operation 
fs0_start_0 (call=37, rc=0, cib-update=55, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:33 info: do_lrm_rsc_op: Performing 
key=46:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=virtual-ip_start_0 )
lrmd[9937]: 2009/03/23_19:33:33 info: rsc:virtual-ip: start
IPaddr2[14090]: 2009/03/23_19:33:33 INFO: ip -f inet addr add 192.98.102.10/24 
brd 192.98.102.255 dev eth1
IPaddr2[14090]: 2009/03/23_19:33:33 INFO: ip link set eth1 up
IPaddr2[14090]: 2009/03/23_19:33:33 INFO: /usr/lib/heartbeat/send_arp -i 200 -r 
5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.98.102.10 eth1 
192.98.102.10 auto not_used not_used
crmd[9940]: 2009/03/23_19:33:33 info: process_lrm_event: LRM operation 
virtual-ip_start_0 (call=39, rc=0, cib-update=56, confirmed=true) complete ok
crmd[9940]: 2009/03/23_19:33:35 info: do_lrm_rsc_op: Performing 
key=47:8:0:84c3fc98-c640-4a3f-b0ea-c1f17e5f73bc op=virtual-ip_monitor_21000 )
crmd[9940]: 2009/03/23_19:33:35 info: process_lrm_event: LRM operation 
virtual-ip_monitor_21000 (call=40, rc=0, cib-update=57, confirmed=false) 
complete ok

as you see, there is nothing in the log about mysql server. looks like
pacemaker has completely ignored it. crm_mon -1 shows:

============
Last updated: Mon Mar 23 19:39:28 2009
Current DC: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325)
Version: 1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160
2 Nodes configured.
2 Resources configured.
============

Node: lenny1 (8df8447f-6ecf-41a7-a131-c89fd59a120d): online
Node: lenny2 (f13aff7b-6c94-43ac-9a24-b118e62d5325): online

Master/Slave Set: ms-drbd0
    drbd0:0     (ocf::heartbeat:drbd):  Master lenny1
    drbd0:1     (ocf::heartbeat:drbd):  Slave lenny2
Resource Group: mysql-server-group
    fs0 (ocf::heartbeat:Filesystem):    Started lenny1
    virtual-ip  (ocf::heartbeat:IPaddr2):       Started lenny1
    mysql-server        (lsb:mysql):    Stopped 

Failed actions:
    mysql-server_monitor_10000 (node=lenny2, call=27, rc=7, status=complete): 
not running
    mysql-server_monitor_10000 (node=lenny1, call=22, rc=7, status=complete): 
not running

-- juha

-------------------------------------------------------------------------

node $id="8df8447f-6ecf-41a7-a131-c89fd59a120d" lenny1
node $id="f13aff7b-6c94-43ac-9a24-b118e62d5325" lenny2
primitive drbd0 ocf:heartbeat:drbd \
        params drbd_resource="drbd0" \
        op monitor interval="59s" role="Master" timeout="30s" \
        op monitor interval="60s" role="Slave" timeout="30s"
primitive fs0 ocf:heartbeat:Filesystem \
        params fstype="ext3" directory="/var/lib/mysql" device="/dev/drbd0" \
        meta target-role="Started"
primitive virtual-ip ocf:heartbeat:IPaddr2 \
        params ip="192.98.102.10" broadcast="192.98.102.255" nic="eth1" 
cidr_netmask="24" \
        op monitor interval="21s" timeout="5s"
primitive mysql-server lsb:mysql \
        op monitor interval="10s" timeout="30s" start-delay="10s"
group mysql-server-group fs0 virtual-ip mysql-server \
        meta migration-threshold="3"
ms ms-drbd0 drbd0 \
        meta clone-max="2" notify="true" globally-unique="false" 
target-role="Started"
location ms-drbd0-master-on-lenny1 ms-drbd0 \
        rule $id="ms-drbd0-master-on-lenny1-rule" $role="master" 100: #uname eq 
lenny1
colocation mysql-server-group-on-ms-drbd0 inf: mysql-server-group 
ms-drbd0:Master
order ms-drbd0-before-mysql-server-group inf: ms-drbd0:promote 
mysql-server-group:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.2-ec6b0bbee1f3aa72c4c2559997e675db6ab39160" \
        default-resource-stickiness="1"
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Beginner questions

Reply via email to