[Linux-HA] resources don't migrate when node is declared dead?!?

Jean-Francois Malouin Tue, 29 Jul 2008 14:28:08 -0700

My cluster contains 2 active/passive nodes with one drbd master/slave
resource and one group resource which itself contains 7 resources. I
want the m/s and group to be colocated and when the master loose it's
ping then the slave should be promoted but nothing happens when I
pulled the ethernet cable... Here's what the constrains look like in
the cib right now:


     <constraints>
       <rsc_order id="drbd-before-group_id" from="group_id"
action="start" to="ms-drbd_id" to_action="promote"/>
       <rsc_colocation id="group-on-drbd_id" to="ms-drbd_id"
to_role="master" from="group_id" score="infinity"/>
       <rsc_location id="drbd_id:connected" rsc="ms-drbd_id">
         <rule role="master" id="drbd_id:connected:rule"
score_attribute="pingd">
           <expression id="drbd_id:connected-rule-1" attribute="pingd"
operation="defined"/>
         </rule>
       </rsc_location>
       <rsc_location id="cli-prefer-mysql_id" rsc="mysql_id">
         <rule id="cli-prefer-rule-mysql_id" score="INFINITY">
           <expression id="cli-prefer-expr-mysql_id"
attribute="#uname" operation="eq" value="feeble-1" type="string"/>
         </rule>
       </rsc_location>
       <rsc_location id="cli-prefer-drbd_id:0" rsc="drbd_id:0">
         <rule id="cli-prefer-rule-drbd_id:0" score="INFINITY">
           <expression id="cli-prefer-expr-drbd_id:0"
attribute="#uname" operation="eq" value="feeble-0" type="string"/>
         </rule>
       </rsc_location>
       <rsc_location id="cli-prefer-drbd_id:1" rsc="drbd_id:1">
         <rule id="cli-prefer-rule-drbd_id:1" score="INFINITY">
           <expression id="cli-prefer-expr-drbd_id:1"
attribute="#uname" operation="eq" value="feeble-0" type="string"/>
         </rule>
       </rsc_location>
     </constraints>


heartbeat ha.cf config file:

mcast eth0 239.0.0.1 694 1 0
bcast eth1 
deadping 20
deadtime 10
ping 132.206.178.1
baud 115200
serial /dev/ttyS0
node feeble-0 feeble-1
auto_failback off
use_logd on
respawn hacluster /usr/lib/heartbeat/dopd 
apiauth dopd gid=haclient uid=hacluster
respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
apiauth mgmtd uid=root
respawn root /usr/lib/heartbeat/mgmtd -v


After reconnecting i see in the ha.log


heartbeat[2371]: 2008/07/29_16:10:12 info: Link 132.206.178.1:132.206.178.1 
dead.
pingd[2523]: 2008/07/29_16:10:12 notice: pingd_lstatus_callback: Status update: 
Ping node 132.206.178.1 now has status [dead]
pingd[2523]: 2008/07/29_16:10:12 notice: pingd_nstatus_callback: Status update: 
Ping node 132.206.178.1 now has status [dead]
pingd[2523]: 2008/07/29_16:10:12 info: send_update: 0 active ping nodes
heartbeat[2371]: 2008/07/29_16:10:12 info: Link feeble-0:eth0 dead.
pingd[2523]: 2008/07/29_16:10:12 notice: pingd_lstatus_callback: Status update: 
Ping node feeble-0 now has status [dead]
pingd[2523]: 2008/07/29_16:10:12 notice: pingd_nstatus_callback: Status update: 
Ping node feeble-0 now has status [dead]
attrd[2529]: 2008/07/29_16:10:17 info: attrd_trigger_update: Sending flush op 
to all hosts for: pingd
attrd[2529]: 2008/07/29_16:10:17 info: attrd_ha_callback: flush message from 
feeble-1
attrd[2529]: 2008/07/29_16:10:17 info: attrd_perform_update: Sent update 13: 
pingd=0
tengine[2543]: 2008/07/29_16:10:17 info: extract_event: Aborting on 
transient_attributes changes for d7fb07f0-a857-446d-98e6-fce91c1b6094
tengine[2543]: 2008/07/29_16:10:17 info: update_abort_priority: Abort priority 
upgraded to 1000000
tengine[2543]: 2008/07/29_16:10:17 info: te_update_diff: Aborting on 
transient_attributes deletions
crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: State transition 
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE 
origin=route_message ]
crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: All 2 cluster nodes 
are eligible to run resources.
pengine[2544]: 2008/07/29_16:10:17 info: determine_online_status: Node feeble-1 
is online
pengine[2544]: 2008/07/29_16:10:17 info: determine_online_status: Node feeble-0 
is online
pengine[2544]: 2008/07/29_16:10:17 info: unpack_find_resource: Internally 
renamed drbd_id:0 on feeble-0 to drbd_id:1
pengine[2544]: 2008/07/29_16:10:17 notice: clone_print: Master/Slave Set: 
ms-drbd_id
pengine[2544]: 2008/07/29_16:10:17 notice: native_print:     drbd_id:0  
(ocf::heartbeat:drbd):  Master feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print:     drbd_id:1  
(ocf::heartbeat:drbd):  Slave feeble-0
pengine[2544]: 2008/07/29_16:10:17 notice: group_print: Resource Group: group_id
pengine[2544]: 2008/07/29_16:10:17 notice: native_print:     fs_id      
(ocf::heartbeat:Filesystem):    Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print:     nfs_kernel-id      
(ocf::bic:nfs-kernel):  Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print:     nfs_common-id      
(ocf::bic:nfs-common):  Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print:     ip_id      
(ocf::heartbeat:IPaddr):        Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print:     mysql_id   
(ocf::heartbeat:mysql): Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print:     apache_id  
(ocf::heartbeat:apache):        Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print:     email_id   
(ocf::heartbeat:MailTo):        Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 info: master_color: Promoting drbd_id:0 
(Master feeble-1)
pengine[2544]: 2008/07/29_16:10:17 info: master_color: ms-drbd_id: Promoted 1 
instances of a possible 1 to master
pengine[2544]: 2008/07/29_16:10:17 info: master_color: ms-drbd_id: Promoted 1 
instances of a possible 1 to master
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource 
drbd_id:0       (Master feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource 
drbd_id:1       (Slave feeble-0)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource 
drbd_id:0       (Master feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource 
drbd_id:1       (Slave feeble-0)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource fs_id   
(Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource 
nfs_kernel-id   (Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource 
nfs_common-id   (Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource ip_id   
(Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource 
mysql_id        (Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource 
apache_id       (Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource 
email_id        (Started feeble-1)
crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: State transition 
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE 
origin=route_messag
e ]
tengine[2543]: 2008/07/29_16:10:17 info: process_te_message: Processing graph 
derived from /var/lib/heartbeat/pengine/pe-input-7.bz2
tengine[2543]: 2008/07/29_16:10:17 info: unpack_graph: Unpacked transition 29: 
0 actions in 0 synapses
tengine[2543]: 2008/07/29_16:10:17 info: run_graph: Transition 29: (Complete=0, 
Pending=0, Fired=0, Skipped=0, Incomplete=0)
tengine[2543]: 2008/07/29_16:10:17 info: notify_crmd: Transition 29 status: 
te_complete - <null>
crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: State transition 
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE 
origin=route_message ]
pengine[2544]: 2008/07/29_16:10:17 info: process_pe_message: Transition 29: 
PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-7.bz2
heartbeat[2371]: 2008/07/29_16:10:22 WARN: node 132.206.178.1: is dead
crmd[2530]: 2008/07/29_16:10:22 notice: crmd_ha_status_callback: Status update: 
Node 132.206.178.1 now has status [dead]
pingd[2523]: 2008/07/29_16:10:22 notice: pingd_nstatus_callback: Status update: 
Ping node 132.206.178.1 now has status [dead]
pingd[2523]: 2008/07/29_16:10:22 info: send_update: 0 active ping nodes
crmd[2530]: 2008/07/29_16:10:22 WARN: get_uuid: Could not calculate UUID for 
132.206.178.1
crmd[2530]: 2008/07/29_16:10:22 info: crm_update_peer: Creating entry for node 
132.206.178.1/0/0
crmd[2530]: 2008/07/29_16:10:22 ERROR: crm_abort: crm_update_peer: Triggered 
assert at membership.c:161 : uuid != NULL
crmd[2530]: 2008/07/29_16:10:22 ERROR: crm_abort: crm_update_peer_proc: 
Triggered assert at membership.c:263 : node != NULL
crmd[2530]: 2008/07/29_16:10:22 WARN: get_uuid: Could not calculate UUID for 
132.206.178.1
heartbeat[2371]: 2008/07/29_16:14:50 info: Link feeble-0:eth0 up.
pingd[2523]: 2008/07/29_16:14:50 notice: pingd_lstatus_callback: Status update: 
Ping node feeble-0 now has status [up]
pingd[2523]: 2008/07/29_16:14:50 notice: pingd_nstatus_callback: Status update: 
Ping node feeble-0 now has status [up]
heartbeat[2371]: 2008/07/29_16:14:51 info: Link 132.206.178.1:132.206.178.1 up.
heartbeat[2371]: 2008/07/29_16:14:51 WARN: Late heartbeat: Node 132.206.178.1: 
interval 290020 ms
heartbeat[2371]: 2008/07/29_16:14:51 info: Status update for node 
132.206.178.1: status ping
crmd[2530]: 2008/07/29_16:14:51 notice: crmd_ha_status_callback: Status update: 
Node 132.206.178.1 now has status [ping]
pingd[2523]: 2008/07/29_16:14:51 notice: pingd_lstatus_callback: Status update: 
Ping node 132.206.178.1 now has status [up]
pingd[2523]: 2008/07/29_16:14:51 notice: pingd_nstatus_callback: Status update: 
Ping node 132.206.178.1 now has status [up]


Must be something obvious but I can't find it.

There used to be a neat script to calculate scores and stickiness but
I upgraded heartbeat to 2.1.3-18 with the deb packages on opensuse.org
and with pacemaker-0.6.5-1 and now showscores.sh gives me 

~# showscores.sh 
Resource            Score     Node            Stickiness #Fail Fail-Stickiness 
-1000000_(master)   -INFINITY ptest[11262]:   100        -1001                
1000000_(master)    INFINITY  ptest[11262]:   100        -1001
100_(master)        100       ptest[11262]:   100        -1001
175_(master)        175       ptest[11262]:   100        -1001
76_(master)         76        ptest[11262]:   100        -1001

Any ideas?

regards,
jf
-- 
<° ><
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] resources don't migrate when node is declared dead?!?

Reply via email to