My cluster contains 2 active/passive nodes with one drbd master/slave
resource and one group resource which itself contains 7 resources. I
want the m/s and group to be colocated and when the master loose it's
ping then the slave should be promoted but nothing happens when I
pulled the ethernet cable... Here's what the constrains look like in
the cib right now:
<constraints>
<rsc_order id="drbd-before-group_id" from="group_id"
action="start" to="ms-drbd_id" to_action="promote"/>
<rsc_colocation id="group-on-drbd_id" to="ms-drbd_id"
to_role="master" from="group_id" score="infinity"/>
<rsc_location id="drbd_id:connected" rsc="ms-drbd_id">
<rule role="master" id="drbd_id:connected:rule"
score_attribute="pingd">
<expression id="drbd_id:connected-rule-1" attribute="pingd"
operation="defined"/>
</rule>
</rsc_location>
<rsc_location id="cli-prefer-mysql_id" rsc="mysql_id">
<rule id="cli-prefer-rule-mysql_id" score="INFINITY">
<expression id="cli-prefer-expr-mysql_id"
attribute="#uname" operation="eq" value="feeble-1" type="string"/>
</rule>
</rsc_location>
<rsc_location id="cli-prefer-drbd_id:0" rsc="drbd_id:0">
<rule id="cli-prefer-rule-drbd_id:0" score="INFINITY">
<expression id="cli-prefer-expr-drbd_id:0"
attribute="#uname" operation="eq" value="feeble-0" type="string"/>
</rule>
</rsc_location>
<rsc_location id="cli-prefer-drbd_id:1" rsc="drbd_id:1">
<rule id="cli-prefer-rule-drbd_id:1" score="INFINITY">
<expression id="cli-prefer-expr-drbd_id:1"
attribute="#uname" operation="eq" value="feeble-0" type="string"/>
</rule>
</rsc_location>
</constraints>
heartbeat ha.cf config file:
mcast eth0 239.0.0.1 694 1 0
bcast eth1
deadping 20
deadtime 10
ping 132.206.178.1
baud 115200
serial /dev/ttyS0
node feeble-0 feeble-1
auto_failback off
use_logd on
respawn hacluster /usr/lib/heartbeat/dopd
apiauth dopd gid=haclient uid=hacluster
respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
apiauth mgmtd uid=root
respawn root /usr/lib/heartbeat/mgmtd -v
After reconnecting i see in the ha.log
heartbeat[2371]: 2008/07/29_16:10:12 info: Link 132.206.178.1:132.206.178.1
dead.
pingd[2523]: 2008/07/29_16:10:12 notice: pingd_lstatus_callback: Status update:
Ping node 132.206.178.1 now has status [dead]
pingd[2523]: 2008/07/29_16:10:12 notice: pingd_nstatus_callback: Status update:
Ping node 132.206.178.1 now has status [dead]
pingd[2523]: 2008/07/29_16:10:12 info: send_update: 0 active ping nodes
heartbeat[2371]: 2008/07/29_16:10:12 info: Link feeble-0:eth0 dead.
pingd[2523]: 2008/07/29_16:10:12 notice: pingd_lstatus_callback: Status update:
Ping node feeble-0 now has status [dead]
pingd[2523]: 2008/07/29_16:10:12 notice: pingd_nstatus_callback: Status update:
Ping node feeble-0 now has status [dead]
attrd[2529]: 2008/07/29_16:10:17 info: attrd_trigger_update: Sending flush op
to all hosts for: pingd
attrd[2529]: 2008/07/29_16:10:17 info: attrd_ha_callback: flush message from
feeble-1
attrd[2529]: 2008/07/29_16:10:17 info: attrd_perform_update: Sent update 13:
pingd=0
tengine[2543]: 2008/07/29_16:10:17 info: extract_event: Aborting on
transient_attributes changes for d7fb07f0-a857-446d-98e6-fce91c1b6094
tengine[2543]: 2008/07/29_16:10:17 info: update_abort_priority: Abort priority
upgraded to 1000000
tengine[2543]: 2008/07/29_16:10:17 info: te_update_diff: Aborting on
transient_attributes deletions
crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: State transition
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE
origin=route_message ]
crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: All 2 cluster nodes
are eligible to run resources.
pengine[2544]: 2008/07/29_16:10:17 info: determine_online_status: Node feeble-1
is online
pengine[2544]: 2008/07/29_16:10:17 info: determine_online_status: Node feeble-0
is online
pengine[2544]: 2008/07/29_16:10:17 info: unpack_find_resource: Internally
renamed drbd_id:0 on feeble-0 to drbd_id:1
pengine[2544]: 2008/07/29_16:10:17 notice: clone_print: Master/Slave Set:
ms-drbd_id
pengine[2544]: 2008/07/29_16:10:17 notice: native_print: drbd_id:0
(ocf::heartbeat:drbd): Master feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print: drbd_id:1
(ocf::heartbeat:drbd): Slave feeble-0
pengine[2544]: 2008/07/29_16:10:17 notice: group_print: Resource Group: group_id
pengine[2544]: 2008/07/29_16:10:17 notice: native_print: fs_id
(ocf::heartbeat:Filesystem): Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print: nfs_kernel-id
(ocf::bic:nfs-kernel): Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print: nfs_common-id
(ocf::bic:nfs-common): Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print: ip_id
(ocf::heartbeat:IPaddr): Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print: mysql_id
(ocf::heartbeat:mysql): Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print: apache_id
(ocf::heartbeat:apache): Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 notice: native_print: email_id
(ocf::heartbeat:MailTo): Started feeble-1
pengine[2544]: 2008/07/29_16:10:17 info: master_color: Promoting drbd_id:0
(Master feeble-1)
pengine[2544]: 2008/07/29_16:10:17 info: master_color: ms-drbd_id: Promoted 1
instances of a possible 1 to master
pengine[2544]: 2008/07/29_16:10:17 info: master_color: ms-drbd_id: Promoted 1
instances of a possible 1 to master
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
drbd_id:0 (Master feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
drbd_id:1 (Slave feeble-0)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
drbd_id:0 (Master feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
drbd_id:1 (Slave feeble-0)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource fs_id
(Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
nfs_kernel-id (Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
nfs_common-id (Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource ip_id
(Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
mysql_id (Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
apache_id (Started feeble-1)
pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
email_id (Started feeble-1)
crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: State transition
S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=route_messag
e ]
tengine[2543]: 2008/07/29_16:10:17 info: process_te_message: Processing graph
derived from /var/lib/heartbeat/pengine/pe-input-7.bz2
tengine[2543]: 2008/07/29_16:10:17 info: unpack_graph: Unpacked transition 29:
0 actions in 0 synapses
tengine[2543]: 2008/07/29_16:10:17 info: run_graph: Transition 29: (Complete=0,
Pending=0, Fired=0, Skipped=0, Incomplete=0)
tengine[2543]: 2008/07/29_16:10:17 info: notify_crmd: Transition 29 status:
te_complete - <null>
crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: State transition
S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE
origin=route_message ]
pengine[2544]: 2008/07/29_16:10:17 info: process_pe_message: Transition 29:
PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-7.bz2
heartbeat[2371]: 2008/07/29_16:10:22 WARN: node 132.206.178.1: is dead
crmd[2530]: 2008/07/29_16:10:22 notice: crmd_ha_status_callback: Status update:
Node 132.206.178.1 now has status [dead]
pingd[2523]: 2008/07/29_16:10:22 notice: pingd_nstatus_callback: Status update:
Ping node 132.206.178.1 now has status [dead]
pingd[2523]: 2008/07/29_16:10:22 info: send_update: 0 active ping nodes
crmd[2530]: 2008/07/29_16:10:22 WARN: get_uuid: Could not calculate UUID for
132.206.178.1
crmd[2530]: 2008/07/29_16:10:22 info: crm_update_peer: Creating entry for node
132.206.178.1/0/0
crmd[2530]: 2008/07/29_16:10:22 ERROR: crm_abort: crm_update_peer: Triggered
assert at membership.c:161 : uuid != NULL
crmd[2530]: 2008/07/29_16:10:22 ERROR: crm_abort: crm_update_peer_proc:
Triggered assert at membership.c:263 : node != NULL
crmd[2530]: 2008/07/29_16:10:22 WARN: get_uuid: Could not calculate UUID for
132.206.178.1
heartbeat[2371]: 2008/07/29_16:14:50 info: Link feeble-0:eth0 up.
pingd[2523]: 2008/07/29_16:14:50 notice: pingd_lstatus_callback: Status update:
Ping node feeble-0 now has status [up]
pingd[2523]: 2008/07/29_16:14:50 notice: pingd_nstatus_callback: Status update:
Ping node feeble-0 now has status [up]
heartbeat[2371]: 2008/07/29_16:14:51 info: Link 132.206.178.1:132.206.178.1 up.
heartbeat[2371]: 2008/07/29_16:14:51 WARN: Late heartbeat: Node 132.206.178.1:
interval 290020 ms
heartbeat[2371]: 2008/07/29_16:14:51 info: Status update for node
132.206.178.1: status ping
crmd[2530]: 2008/07/29_16:14:51 notice: crmd_ha_status_callback: Status update:
Node 132.206.178.1 now has status [ping]
pingd[2523]: 2008/07/29_16:14:51 notice: pingd_lstatus_callback: Status update:
Ping node 132.206.178.1 now has status [up]
pingd[2523]: 2008/07/29_16:14:51 notice: pingd_nstatus_callback: Status update:
Ping node 132.206.178.1 now has status [up]
Must be something obvious but I can't find it.
There used to be a neat script to calculate scores and stickiness but
I upgraded heartbeat to 2.1.3-18 with the deb packages on opensuse.org
and with pacemaker-0.6.5-1 and now showscores.sh gives me
~# showscores.sh
Resource Score Node Stickiness #Fail Fail-Stickiness
-1000000_(master) -INFINITY ptest[11262]: 100 -1001
1000000_(master) INFINITY ptest[11262]: 100 -1001
100_(master) 100 ptest[11262]: 100 -1001
175_(master) 175 ptest[11262]: 100 -1001
76_(master) 76 ptest[11262]: 100 -1001
Any ideas?
regards,
jf
--
<° ><
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems