> Date: Tue, 29 Jul 2008 17:27:38 -0400
> From: [EMAIL PROTECTED]
> To: [email protected]
> Subject: [Linux-HA] resources don't migrate when node is declared dead?!?
>
> My cluster contains 2 active/passive nodes with one drbd master/slave
> resource and one group resource which itself contains 7 resources. I
> want the m/s and group to be colocated and when the master loose it's
> ping then the slave should be promoted but nothing happens when I
> pulled the ethernet cable... Here's what the constrains look like in
> the cib right now:
>
> <constraints>
> <rsc_order id="drbd-before-group_id" from="group_id"
> action="start" to="ms-drbd_id" to_action="promote"/>
action from type to_action to
start group_id after promote ms-drbd_id
is OK.
> <rsc_colocation id="group-on-drbd_id" to="ms-drbd_id"
> to_role="master" from="group_id" score="infinity"/>
Make rsc 'from' run on the same machine as rsc 'to'
group_id runs on the samen machine as ms-drbd_id with role master
is OK.
> <rsc_location id="drbd_id:connected" rsc="ms-drbd_id">
> <rule role="master" id="drbd_id:connected:rule"
> score_attribute="pingd">
> <expression id="drbd_id:connected-rule-1" attribute="pingd"
> operation="defined"/>
> </rule>
If your score_attribute="pingd", your pingd scaling factor is 100, then having
access to one node is worth 100, 2 nodes is worth 200, and so on.
If you don´t have connectivity score_attribute=0.
> </rsc_location>
> <rsc_location id="cli-prefer-mysql_id" rsc="mysql_id">
> <rule id="cli-prefer-rule-mysql_id" score="INFINITY">
> <expression id="cli-prefer-expr-mysql_id" attribute="#uname"
> operation="eq" value="feeble-1" type="string"/>
> </rule>
If the group_id runs on the same machine as ms-drbd_id, you can not specify to
mysql_id run always in feeble-1, because mysql_id belongs to the group
group_id. Your group_id runs on the same machine as ms-drbd_id and ms-drbd_id
runs in the node with the best connectivity.
> </rsc_location>
> <rsc_location id="cli-prefer-drbd_id:0" rsc="drbd_id:0">
> <rule id="cli-prefer-rule-drbd_id:0" score="INFINITY">
> <expression id="cli-prefer-expr-drbd_id:0"
> attribute="#uname" operation="eq" value="feeble-0" type="string"/>
> </rule>
> </rsc_location>
> <rsc_location id="cli-prefer-drbd_id:1" rsc="drbd_id:1">
> <rule id="cli-prefer-rule-drbd_id:1" score="INFINITY">
> <expression id="cli-prefer-expr-drbd_id:1"
> attribute="#uname" operation="eq" value="feeble-0" type="string"/>
> </rule>
> </rsc_location>
> </constraints>
>
>
> heartbeat ha.cf config file:
>
> mcast eth0 239.0.0.1 694 1 0
> bcast eth1
> deadping 20
> deadtime 10
> ping 132.206.178.1
> baud 115200
> serial /dev/ttyS0
> node feeble-0 feeble-1
> auto_failback off
> use_logd on
> respawn hacluster /usr/lib/heartbeat/dopd
> apiauth dopd gid=haclient uid=hacluster
> respawn root /usr/lib/heartbeat/pingd -m 100 -d 5s
> apiauth mgmtd uid=root
> respawn root /usr/lib/heartbeat/mgmtd -v
>
>
> After reconnecting i see in the ha.log
>
>
> heartbeat[2371]: 2008/07/29_16:10:12 info: Link 132.206.178.1:132.206.178.1
> dead.
> pingd[2523]: 2008/07/29_16:10:12 notice: pingd_lstatus_callback: Status
> update: Ping node 132.206.178.1 now has status [dead]
> pingd[2523]: 2008/07/29_16:10:12 notice: pingd_nstatus_callback: Status
> update: Ping node 132.206.178.1 now has status [dead]
> pingd[2523]: 2008/07/29_16:10:12 info: send_update: 0 active ping nodes
> heartbeat[2371]: 2008/07/29_16:10:12 info: Link feeble-0:eth0 dead.
> pingd[2523]: 2008/07/29_16:10:12 notice: pingd_lstatus_callback: Status
> update: Ping node feeble-0 now has status [dead]
> pingd[2523]: 2008/07/29_16:10:12 notice: pingd_nstatus_callback: Status
> update: Ping node feeble-0 now has status [dead]
> attrd[2529]: 2008/07/29_16:10:17 info: attrd_trigger_update: Sending flush op
> to all hosts for: pingd
> attrd[2529]: 2008/07/29_16:10:17 info: attrd_ha_callback: flush message from
> feeble-1
> attrd[2529]: 2008/07/29_16:10:17 info: attrd_perform_update: Sent update 13:
> pingd=0
> tengine[2543]: 2008/07/29_16:10:17 info: extract_event: Aborting on
> transient_attributes changes for d7fb07f0-a857-446d-98e6-fce91c1b6094
> tengine[2543]: 2008/07/29_16:10:17 info: update_abort_priority: Abort
> priority upgraded to 1000000
> tengine[2543]: 2008/07/29_16:10:17 info: te_update_diff: Aborting on
> transient_attributes deletions
> crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: State transition
> S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE
> origin=route_message ]
> crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: All 2 cluster
> nodes are eligible to run resources.
> pengine[2544]: 2008/07/29_16:10:17 info: determine_online_status: Node
> feeble-1 is online
> pengine[2544]: 2008/07/29_16:10:17 info: determine_online_status: Node
> feeble-0 is online
> pengine[2544]: 2008/07/29_16:10:17 info: unpack_find_resource: Internally
> renamed drbd_id:0 on feeble-0 to drbd_id:1
> pengine[2544]: 2008/07/29_16:10:17 notice: clone_print: Master/Slave Set:
> ms-drbd_id
> pengine[2544]: 2008/07/29_16:10:17 notice: native_print: drbd_id:0
> (ocf::heartbeat:drbd): Master feeble-1
> pengine[2544]: 2008/07/29_16:10:17 notice: native_print: drbd_id:1
> (ocf::heartbeat:drbd): Slave feeble-0
> pengine[2544]: 2008/07/29_16:10:17 notice: group_print: Resource Group:
> group_id
> pengine[2544]: 2008/07/29_16:10:17 notice: native_print: fs_id
> (ocf::heartbeat:Filesystem): Started feeble-1
> pengine[2544]: 2008/07/29_16:10:17 notice: native_print: nfs_kernel-id
> (ocf::bic:nfs-kernel): Started feeble-1
> pengine[2544]: 2008/07/29_16:10:17 notice: native_print: nfs_common-id
> (ocf::bic:nfs-common): Started feeble-1
> pengine[2544]: 2008/07/29_16:10:17 notice: native_print: ip_id
> (ocf::heartbeat:IPaddr): Started feeble-1
> pengine[2544]: 2008/07/29_16:10:17 notice: native_print: mysql_id
> (ocf::heartbeat:mysql): Started feeble-1
> pengine[2544]: 2008/07/29_16:10:17 notice: native_print: apache_id
> (ocf::heartbeat:apache): Started feeble-1
> pengine[2544]: 2008/07/29_16:10:17 notice: native_print: email_id
> (ocf::heartbeat:MailTo): Started feeble-1
> pengine[2544]: 2008/07/29_16:10:17 info: master_color: Promoting drbd_id:0
> (Master feeble-1)
> pengine[2544]: 2008/07/29_16:10:17 info: master_color: ms-drbd_id: Promoted 1
> instances of a possible 1 to master
> pengine[2544]: 2008/07/29_16:10:17 info: master_color: ms-drbd_id: Promoted 1
> instances of a possible 1 to master
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
> drbd_id:0 (Master feeble-1)
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
> drbd_id:1 (Slave feeble-0)
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
> drbd_id:0 (Master feeble-1)
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
> drbd_id:1 (Slave feeble-0)
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource fs_id
> (Started feeble-1)
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
> nfs_kernel-id (Started feeble-1)
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
> nfs_common-id (Started feeble-1)
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource ip_id
> (Started feeble-1)
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
> mysql_id (Started feeble-1)
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
> apache_id (Started feeble-1)
> pengine[2544]: 2008/07/29_16:10:17 notice: NoRoleChange: Leave resource
> email_id (Started feeble-1)
> crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: State transition
> S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=route_messag
> e ]
> tengine[2543]: 2008/07/29_16:10:17 info: process_te_message: Processing graph
> derived from /var/lib/heartbeat/pengine/pe-input-7.bz2
> tengine[2543]: 2008/07/29_16:10:17 info: unpack_graph: Unpacked transition
> 29: 0 actions in 0 synapses
> tengine[2543]: 2008/07/29_16:10:17 info: run_graph: Transition 29:
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0)
> tengine[2543]: 2008/07/29_16:10:17 info: notify_crmd: Transition 29 status:
> te_complete - <null>
> crmd[2530]: 2008/07/29_16:10:17 info: do_state_transition: State transition
> S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE
> origin=route_message ]
> pengine[2544]: 2008/07/29_16:10:17 info: process_pe_message: Transition 29:
> PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-7.bz2
> heartbeat[2371]: 2008/07/29_16:10:22 WARN: node 132.206.178.1: is dead
> crmd[2530]: 2008/07/29_16:10:22 notice: crmd_ha_status_callback: Status
> update: Node 132.206.178.1 now has status [dead]
> pingd[2523]: 2008/07/29_16:10:22 notice: pingd_nstatus_callback: Status
> update: Ping node 132.206.178.1 now has status [dead]
> pingd[2523]: 2008/07/29_16:10:22 info: send_update: 0 active ping nodes
> crmd[2530]: 2008/07/29_16:10:22 WARN: get_uuid: Could not calculate UUID for
> 132.206.178.1
> crmd[2530]: 2008/07/29_16:10:22 info: crm_update_peer: Creating entry for
> node 132.206.178.1/0/0
> crmd[2530]: 2008/07/29_16:10:22 ERROR: crm_abort: crm_update_peer: Triggered
> assert at membership.c:161 : uuid != NULL
> crmd[2530]: 2008/07/29_16:10:22 ERROR: crm_abort: crm_update_peer_proc:
> Triggered assert at membership.c:263 : node != NULL
> crmd[2530]: 2008/07/29_16:10:22 WARN: get_uuid: Could not calculate UUID for
> 132.206.178.1
> heartbeat[2371]: 2008/07/29_16:14:50 info: Link feeble-0:eth0 up.
> pingd[2523]: 2008/07/29_16:14:50 notice: pingd_lstatus_callback: Status
> update: Ping node feeble-0 now has status [up]
> pingd[2523]: 2008/07/29_16:14:50 notice: pingd_nstatus_callback: Status
> update: Ping node feeble-0 now has status [up]
> heartbeat[2371]: 2008/07/29_16:14:51 info: Link 132.206.178.1:132.206.178.1
> up.
> heartbeat[2371]: 2008/07/29_16:14:51 WARN: Late heartbeat: Node
> 132.206.178.1: interval 290020 ms
> heartbeat[2371]: 2008/07/29_16:14:51 info: Status update for node
> 132.206.178.1: status ping
> crmd[2530]: 2008/07/29_16:14:51 notice: crmd_ha_status_callback: Status
> update: Node 132.206.178.1 now has status [ping]
> pingd[2523]: 2008/07/29_16:14:51 notice: pingd_lstatus_callback: Status
> update: Ping node 132.206.178.1 now has status [up]
> pingd[2523]: 2008/07/29_16:14:51 notice: pingd_nstatus_callback: Status
> update: Ping node 132.206.178.1 now has status [up]
>
>
> Must be something obvious but I can't find it.
>
> There used to be a neat script to calculate scores and stickiness but
> I upgraded heartbeat to 2.1.3-18 with the deb packages on opensuse.org
> and with pacemaker-0.6.5-1 and now showscores.sh gives me
>
> ~# showscores.sh
> Resource Score Node Stickiness #Fail
> Fail-Stickiness
> -1000000_(master) -INFINITY ptest[11262]: 100 -1001
> 1000000_(master) INFINITY ptest[11262]: 100 -1001
> 100_(master) 100 ptest[11262]: 100 -1001
> 175_(master) 175 ptest[11262]: 100 -1001
> 76_(master) 76 ptest[11262]: 100 -1001
I don´t understand this, where is the name of resources? node?
>
> Any ideas?
Before pulled the ethernet cable, try:
cibadmin -Q -o nodes
You can see the all the nodes of your cluster?
Today in my cluster only a node of the two was recognized.
I do not think you have helped a lot, even I do not have much
experience with the cib.xml. I hope that the comments at least you are
beneficial.
>
> regards,
> jf
> --
> <° ><
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_________________________________________________________________
Hazte tu propia televisión a la carta. Música, noticias, estrenos, cine, humor
y viajes en MSN Vídeo
http://video.msn.com/?mkt=es-es_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems