Hello everybody,
I'm pretty new with heartbeat and i've got one problem.
The main objective of my tests were to be able of moving resources from
one node to one other easily, did it successfully with crm_resource.
The main problem i'm experiencing is that it seems that when i fire the
first node which holds the IPaddr, the node 2 gets the ip BUT releases
it immediatly.
The configuration :
node 1 : sntest1.dclux.loc
* eth0 : 192.168.1.166/24
* eth1 : 192.168.176.166/24
node 2 : sntest2.dclux.loc
* eth0 : 192.168.1.167/24
* eth1 : 192.168.176.166/24
Resource : 192.168.1.168/24 as an alias of eth0.
/etc/ha.d/ha.cf @sntest1 :
logfacility local0
logfile /var/log/ha-log
debugfile /var/log/ha-debug
keepalive 100ms
deadtime 250ms
warntime 500ms
initdead 120 # depend on your hardware
udpport 694
ping 192.168.176.167
bcast eth1
auto_failback off
node sntest1.dclux.loc
node sntest2.dclux.loc
#use_logd yes
compression bz2
compression_threshold 2
crm yes
/etc/ha.d/ha.cf @sntest2 :
logfacility local0
logfile /var/log/ha-log
debugfile /var/log/ha-debug
keepalive 100ms
deadtime 250ms
warntime 500ms
initdead 120 # depend on your hardware
udpport 694
ping 192.168.176.166
bcast eth1
auto_failback off
node sntest1.dclux.loc
node sntest2.dclux.loc
#use_logd yes
compression bz2
compression_threshold 2
crm yes
/var/lib/heartbeat/cib.xml :
<cib admin_epoch="0" node_fencing="yes" have_quorum="false"
ignore_dtd="false" num_peers="2" ccm_transition="3" generated="true"
dc_uuid="a3efd7cd-eca4-4ea5
-9bba-417225edb077" epoch="2" cib_feature_revision="1.3"
num_updates="30" cib-last-written="Mon Jun 18 20:49:43 2007">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<attributes>
<nvpair id="cib-bootstrap-options-symmetric-cluster"
name="symmetric-cluster" value="true"/>
<nvpair id="cib-bootstrap-options-no_quorum-policy"
name="no_quorum-policy" value="stop"/>
<nvpair
id="cib-bootstrap-options-default-resource-stickiness"
name="default-resource-stickiness" value="0"/>
<nvpair
id="cib-bootstrap-options-default-resource-failure-stickiness"
name="default-resource-failure-stickiness" value="0"/>
<nvpair id="cib-bootstrap-options-stonith-enabled"
name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-stonith-action"
name="stonith-action" value="reboot"/>
<nvpair id="cib-bootstrap-options-stop-orphan-resources"
name="stop-orphan-resources" value="true"/>
<nvpair id="cib-bootstrap-options-stop-orphan-actions"
name="stop-orphan-actions" value="true"/>
<nvpair id="cib-bootstrap-options-remove-after-stop"
name="remove-after-stop" value="false"/>
<nvpair id="cib-bootstrap-options-short-resource-names"
name="short-resource-names" value="true"/>
<nvpair id="cib-bootstrap-options-transition-idle-timeout"
name="transition-idle-timeout" value="5min"/>
<nvpair id="cib-bootstrap-options-default-action-timeout"
name="default-action-timeout" value="5s"/>
<nvpair id="cib-bootstrap-options-is-managed-default"
name="is-managed-default" value="true"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes>
<node id="db1c3a49-0f90-424a-8f09-c85af7ed83a8"
uname="sntest2.dclux.loc" type="normal"/>
<node id="a3efd7cd-eca4-4ea5-9bba-417225edb077"
uname="sntest1.dclux.loc" type="normal"/>
</nodes>
<resources>
<primitive class="ocf" id="IPaddr_192_168_1_168"
provider="heartbeat" type="IPaddr" failstop_type="stonith">
<operations>
<op id="IPaddr_192_168_1_168_mon" interval="5s"
name="monitor" timeout="5s"/>
</operations>
<instance_attributes id="IPaddr_192_168_1_168_inst_attr">
<attributes>
<nvpair id="IPaddr_192_168_1_168_attr_0" name="ip"
value="192.168.1.168"/>
<nvpair id="IPaddr_192_168_1_168_attr_1" name="netmask"
value="24"/>
<nvpair id="IPaddr_192_168_1_168_attr_2" name="nic"
value="eth0"/>
</attributes>
</instance_attributes>
</primitive>
</resources>
<constraints>
<rsc_location id="rsc_location_IPaddr_192_168_1_168"
rsc="IPaddr_192_168_1_168">
<rule id="prefered_location_IPaddr_192_168_1_168" score="100">
<expression attribute="#uname"
id="prefered_location_IPaddr_192_168_1_168_expr" operation="eq"
value="sntest1.dclux.loc"/>
</rule>
</rsc_location>
<rsc_location rsc="IPaddr_192_168_1_168"
id="cli-prefer-IPaddr_192_168_1_168">
<rule score="INFINITY" id="cli-prefer-rule-IPaddr_192_168_1_168">
<expression attribute="#uname" operation="eq" type="string"
id="cli-prefer-expr-IPaddr_192_168_1_168" value="sntest1.dclux.loc"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>
**** Time frame of the test ****
* heartbeat running, resource IPaddr_192_168_1_168 is on sntest1.
* i do by hand : "ifconfig eth1 down" on sntest1
here are the logs on sntest2 after this :
heartbeat[3457]: 2007/06/18_20:49:44 WARN: node 192.168.176.166: is dead
heartbeat[3457]: 2007/06/18_20:49:44 WARN: node sntest1.dclux.loc: is dead
crmd[3472]: 2007/06/18_20:49:44 notice: crmd_ha_status_callback: Status
update: Node 192.168.176.166 now has status [dead]
heartbeat[3457]: 2007/06/18_20:49:44 info: Link
192.168.176.166:192.168.176.166 dead.
heartbeat[3457]: 2007/06/18_20:49:44 info: Link sntest1.dclux.loc:eth1 dead.
crmd[3472]: 2007/06/18_20:49:44 WARN: get_uuid: Could not calculate UUID
for 192.168.176.166
crmd[3472]: 2007/06/18_20:49:44 info: crmd_ha_status_callback: Ping node
192.168.176.166 is dead
crmd[3472]: 2007/06/18_20:49:44 notice: crmd_ha_status_callback: Status
update: Node sntest1.dclux.loc now has status [dead]
cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Local-only Change
(client:3472, call: 34): 0.1.24 (ok)
cib[3693]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version
0.1.24 of the CIB to disk (digest: 4e309facbe71d928e0ec8b6e52a151f0)
tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff
(cib_update): 0.1.24 -> 0.1.24
tengine[3484]: 2007/06/18_20:49:44 WARN: match_down_event: No match for
shutdown action on a3efd7cd-eca4-4ea5-9bba-417225edb077
tengine[3484]: 2007/06/18_20:49:44 info: extract_event: Stonith/shutdown
of a3efd7cd-eca4-4ea5-9bba-417225edb077 not matched
tengine[3484]: 2007/06/18_20:49:44 info: update_abort_priority: Abort
priority upgraded to 1000000
tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Aborting on
transient_attributes deletions
crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition:
sntest2.dclux.loc: State transition S_IDLE -> S_POLICY_ENGINE [
input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: All 2 cluster
nodes are eligable to run resources.
pengine[3485]: 2007/06/18_20:49:44 info: log_data_element:
process_pe_message: [generation] <cib admin_epoch="0" epoch="1"
num_updates="24" node_fencing="yes" generated="true" have_quorum="true"
ignore_dtd="false" num_peers="2" ccm_transition="2"
cib_feature_revision="1.3" dc_uuid="db1c3a49-0f90-424a-8f09-c85af7ed83a8"/>
pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default
value 'stop' for cluster option 'no-quorum-policy'
pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default
value '60s' for cluster option 'cluster-delay'
pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default
value '-1' for cluster option 'pe-error-series-max'
pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default
value '-1' for cluster option 'pe-warn-series-max'
pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default
value '-1' for cluster option 'pe-input-series-max'
pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default
value 'true' for cluster option 'startup-fencing'
pengine[3485]: 2007/06/18_20:49:44 info: determine_online_status: Node
sntest2.dclux.loc is online
pengine[3485]: 2007/06/18_20:49:44 info: native_print:
IPaddr_192_168_1_168 (heartbeat::ocf:IPaddr): Stopped
pengine[3485]: 2007/06/18_20:49:44 notice: StartRsc: sntest2.dclux.loc
Start IPaddr_192_168_1_168
pengine[3485]: 2007/06/18_20:49:44 notice: Recurring:
sntest2.dclux.loc IPaddr_192_168_1_168_monitor_5000
crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition:
sntest2.dclux.loc: State transition S_POLICY_ENGINE ->
S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=route_message ]
tengine[3484]: 2007/06/18_20:49:44 info: unpack_graph: Unpacked
transition 5: 2 actions in 2 synapses
tengine[3484]: 2007/06/18_20:49:44 info: send_rsc_command: Initiating
action 3: IPaddr_192_168_1_168_start_0 on sntest2.dclux.loc
crmd[3472]: 2007/06/18_20:49:44 info: do_lrm_rsc_op: Performing
op=IPaddr_192_168_1_168_start_0
key=3:5:be5645bb-9fc6-4331-9bce-6cad3a7d88f1)
pengine[3485]: 2007/06/18_20:49:44 info: process_pe_message: Transition
5: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-149.bz2
IPaddr[3694]: 2007/06/18_20:49:44 INFO: Using calculated netmask for
192.168.1.168: 255.255.255.0
IPaddr[3694]: 2007/06/18_20:49:44 DEBUG: Using calculated broadcast
for 192.168.1.168: 192.168.1.255
IPaddr[3694]: 2007/06/18_20:49:44 INFO: eval /sbin/ifconfig eth0:0
192.168.1.168 netmask 255.255.255.0 broadcast 192.168.1.255
IPaddr[3694]: 2007/06/18_20:49:44 DEBUG: Sending Gratuitous Arp for
192.168.1.168 on eth0:0 [eth0]
crmd[3472]: 2007/06/18_20:49:44 info: process_lrm_event: LRM operation
IPaddr_192_168_1_168_start_0 (call=7, rc=0) complete
crmd[3472]: 2007/06/18_20:49:44 info: append_restart_list: Resource
IPaddr_192_168_1_168 does not support reloads
cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Update (client:
3472, call:37): 0.1.24 -> 0.1.25 (ok)
tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff
(cib_update): 0.1.24 -> 0.1.25
tengine[3484]: 2007/06/18_20:49:44 info: match_graph_event: Action
IPaddr_192_168_1_168_start_0 (3) confirmed on
db1c3a49-0f90-424a-8f09-c85af7ed83a8
tengine[3484]: 2007/06/18_20:49:44 info: send_rsc_command: Initiating
action 4: IPaddr_192_168_1_168_monitor_5000 on sntest2.dclux.loc
crmd[3472]: 2007/06/18_20:49:44 info: do_lrm_rsc_op: Performing
op=IPaddr_192_168_1_168_monitor_5000
key=4:5:be5645bb-9fc6-4331-9bce-6cad3a7d88f1)
cib[3763]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version
0.1.25 of the CIB to disk (digest: c564f9c3ff4c6ebfa2da07a1487e3195)
crmd[3472]: 2007/06/18_20:49:44 info: process_lrm_event: LRM operation
IPaddr_192_168_1_168_monitor_5000 (call=8, rc=0) complete
cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Update (client:
3472, call:38): 0.1.25 -> 0.1.26 (ok)
tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff
(cib_update): 0.1.25 -> 0.1.26
tengine[3484]: 2007/06/18_20:49:44 info: match_graph_event: Action
IPaddr_192_168_1_168_monitor_5000 (4) confirmed on
db1c3a49-0f90-424a-8f09-c85af7ed83a8
tengine[3484]: 2007/06/18_20:49:44 info: run_graph: Transition 5:
(Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0)
tengine[3484]: 2007/06/18_20:49:44 info: notify_crmd: Transition 5
status: te_complete - <null>
crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition:
sntest2.dclux.loc: State transition S_TRANSITION_ENGINE -> S_IDLE [
input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
cib[3778]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version
0.1.26 of the CIB to disk (digest: b4d9fdb02151cfe83f146108fab56b8c)
cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: no mbr_track info
cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: no mbr_track info
cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: instance=3,
nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=4
cib[3468]: 2007/06/18_20:49:46 info: cib_ccm_msg_callback: LOST:
sntest1.dclux.loc
cib[3468]: 2007/06/18_20:49:46 info: cib_ccm_msg_callback: PEER:
sntest2.dclux.loc
crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event
OC_EV_MS_INVALID from ccm
crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: instance=3,
nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=4
crmd[3472]: 2007/06/18_20:49:46 info: crmd_ccm_msg_callback: Quorum lost
after event=INVALID (id=3)
crmd[3472]: 2007/06/18_20:49:46 info: crmd_ccm_msg_callback: Quorum
lost: triggering transition (INVALID)
crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: INVALID:
trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=4
crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: CURRENT:
sntest2.dclux.loc [nodeid=2, born=3]
cib[3468]: 2007/06/18_20:49:46 info: cib_diff_notify: Local-only Change
(client:3472, call: 39): 0.1.26 (ok)
crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: LOST:
sntest1.dclux.loc [nodeid=1, born=2]
tengine[3484]: 2007/06/18_20:49:46 info: update_abort_priority: Abort
priority upgraded to 1000000
tengine[3484]: 2007/06/18_20:49:46 info: te_update_diff: Processing diff
(cib_update): 0.1.26 -> 0.1.26
crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition:
sntest2.dclux.loc: State transition S_IDLE -> S_POLICY_ENGINE [
input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ]
crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: All 1 cluster
nodes are eligable to run resources.
cib[3779]: 2007/06/18_20:49:46 info: write_cib_contents: Wrote version
0.1.26 of the CIB to disk (digest: bca029655a297d98a51ac667259e2a29)
pengine[3485]: 2007/06/18_20:49:46 info: log_data_element:
process_pe_message: [generation] <cib admin_epoch="0" epoch="1"
num_updates="26" node_fencing="yes" generated="true" have_quorum="false"
ignore_dtd="false" num_peers="2" ccm_transition="3"
cib_feature_revision="1.3" dc_uuid="db1c3a49-0f90-424a-8f09-c85af7ed83a8"/>
pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default
value 'stop' for cluster option 'no-quorum-policy'
pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default
value '60s' for cluster option 'cluster-delay'
pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default
value '-1' for cluster option 'pe-error-series-max'
pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default
value '-1' for cluster option 'pe-warn-series-max'
pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default
value '-1' for cluster option 'pe-input-series-max'
pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default
value 'true' for cluster option 'startup-fencing'
pengine[3485]: 2007/06/18_20:49:46 WARN: cluster_status: We do not have
quorum - fencing and resource management disabled
pengine[3485]: 2007/06/18_20:49:46 info: determine_online_status: Node
sntest2.dclux.loc is online
pengine[3485]: 2007/06/18_20:49:46 info: native_print:
IPaddr_192_168_1_168 (heartbeat::ocf:IPaddr): Started
sntest2.dclux.loc
pengine[3485]: 2007/06/18_20:49:46 notice: StopRsc: sntest2.dclux.loc
Stop IPaddr_192_168_1_168
crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition:
sntest2.dclux.loc: State transition S_POLICY_ENGINE ->
S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
origin=route_message ]
tengine[3484]: 2007/06/18_20:49:46 info: unpack_graph: Unpacked
transition 6: 1 actions in 1 synapses
tengine[3484]: 2007/06/18_20:49:46 info: send_rsc_command: Initiating
action 4: IPaddr_192_168_1_168_stop_0 on sntest2.dclux.loc
crmd[3472]: 2007/06/18_20:49:46 info: do_lrm_rsc_op: Performing
op=IPaddr_192_168_1_168_stop_0 key=4:6:be5645bb-9fc6-4331-9bce-6cad3a7d88f1)
pengine[3485]: 2007/06/18_20:49:46 info: process_pe_message: Transition
6: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-150.bz2
crmd[3472]: 2007/06/18_20:49:46 WARN: process_lrm_event: LRM operation
IPaddr_192_168_1_168_monitor_5000 (call=8, rc=-2) Cancelled
lrmd[3469]: 2007/06/18_20:49:46 info: RA output:
(IPaddr_192_168_1_168:stop:stderr) SIOCDELRT: No such process
IPaddr[3780]: 2007/06/18_20:49:46 INFO: /sbin/ifconfig eth0:0
192.168.1.168 down
crmd[3472]: 2007/06/18_20:49:46 info: process_lrm_event: LRM operation
IPaddr_192_168_1_168_stop_0 (call=10, rc=0) complete
cib[3468]: 2007/06/18_20:49:46 info: cib_diff_notify: Update (client:
3472, call:41): 0.1.26 -> 0.1.27 (ok)
tengine[3484]: 2007/06/18_20:49:46 info: te_update_diff: Processing diff
(cib_update): 0.1.26 -> 0.1.27
tengine[3484]: 2007/06/18_20:49:46 info: match_graph_event: Action
IPaddr_192_168_1_168_stop_0 (4) confirmed on
db1c3a49-0f90-424a-8f09-c85af7ed83a8
tengine[3484]: 2007/06/18_20:49:46 info: run_graph: Transition 6:
(Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0)
tengine[3484]: 2007/06/18_20:49:46 info: notify_crmd: Transition 6
status: te_complete - <null>
crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition:
sntest2.dclux.loc: State transition S_TRANSITION_ENGINE -> S_IDLE [
input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ]
cib[3800]: 2007/06/18_20:49:46 info: write_cib_contents: Wrote version
0.1.27 of the CIB to disk (digest: d622380a2238356b807686a901cd1b6c)
and then ... nothing.
sntest2 doesn't get the resource.
What am I missing ?
Thanks in advance,
Benjamin
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems