Hello everybody,
The main objective of my tests were to be able of moving resources from one node to one other easily, did it successfully with crm_resource. The main problem i'm experiencing is that it seems that when i fire the first node which holds the IPaddr, the node 2 gets the ip BUT releases it immediatly. The configuration : node 1 : sntest1.dclux.loc * eth0 : 192.168.1.166/24 * eth1 : 192.168.176.166/24 node 2 : sntest2.dclux.loc * eth0 : 192.168.1.167/24 * eth1 : 192.168.176.166/24 Resource : 192.168.1.168/24 as an alias of eth0. /etc/ha.d/ha.cf @sntest1 : logfacility local0 logfile /var/log/ha-log debugfile /var/log/ha-debug keepalive 100ms deadtime 250ms warntime 500ms initdead 120 # depend on your hardware udpport 694 ping 192.168.176.167 bcast eth1 auto_failback off node sntest1.dclux.loc node sntest2.dclux.loc #use_logd yes compression bz2 compression_threshold 2 crm yes /etc/ha.d/ha.cf @sntest2 : logfacility local0 logfile /var/log/ha-log debugfile /var/log/ha-debug keepalive 100ms deadtime 250ms warntime 500ms initdead 120 # depend on your hardware udpport 694 ping 192.168.176.166 bcast eth1 auto_failback off node sntest1.dclux.loc node sntest2.dclux.loc #use_logd yes compression bz2 compression_threshold 2 crm yes /var/lib/heartbeat/cib.xml : <cib admin_epoch="0" node_fencing="yes" have_quorum="false" ignore_dtd="false" num_peers="2" ccm_transition="3" generated="true" dc_uuid="a3efd7cd-eca4-4ea5-9bba-417225edb077" epoch="2" cib_feature_revision="1.3" num_updates="30" cib-last-written="Mon Jun 18 20:49:43 2007"> <configuration> <crm_config> <cluster_property_set id="cib-bootstrap-options"> <attributes> <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/> <nvpair id="cib-bootstrap-options-no_quorum-policy" name="no_quorum-policy" value="stop"/> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="0"/> <nvpair id="cib-bootstrap-options-default-resource-failure-stickiness" name="default-resource-failure-stickiness" value="0"/> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/> <nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="reboot"/> <nvpair id="cib-bootstrap-options-stop-orphan-resources" name="stop-orphan-resources" value="true"/> <nvpair id="cib-bootstrap-options-stop-orphan-actions" name="stop-orphan-actions" value="true"/> <nvpair id="cib-bootstrap-options-remove-after-stop" name="remove-after-stop" value="false"/> <nvpair id="cib-bootstrap-options-short-resource-names" name="short-resource-names" value="true"/> <nvpair id="cib-bootstrap-options-transition-idle-timeout" name="transition-idle-timeout" value="5min"/> <nvpair id="cib-bootstrap-options-default-action-timeout" name="default-action-timeout" value="5s"/> <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/> </attributes> </cluster_property_set> </crm_config> <nodes> <node id="db1c3a49-0f90-424a-8f09-c85af7ed83a8" uname="sntest2.dclux.loc" type="normal"/> <node id="a3efd7cd-eca4-4ea5-9bba-417225edb077" uname="sntest1.dclux.loc" type="normal"/> </nodes> <resources> <primitive class="ocf" id="IPaddr_192_168_1_168" provider="heartbeat" type="IPaddr" failstop_type="stonith"> <operations> <op id="IPaddr_192_168_1_168_mon" interval="5s" name="monitor" timeout="5s"/> </operations> <instance_attributes id="IPaddr_192_168_1_168_inst_attr"> <attributes> <nvpair id="IPaddr_192_168_1_168_attr_0" name="ip" value="192.168.1.168"/> <nvpair id="IPaddr_192_168_1_168_attr_1" name="netmask" value="24"/> <nvpair id="IPaddr_192_168_1_168_attr_2" name="nic" value="eth0"/> </attributes> </instance_attributes> </primitive> </resources> <constraints> <rsc_location id="rsc_location_IPaddr_192_168_1_168" rsc="IPaddr_192_168_1_168"> <rule id="prefered_location_IPaddr_192_168_1_168" score="100"> <expression attribute="#uname" id="prefered_location_IPaddr_192_168_1_168_expr" operation="eq" value="sntest1.dclux.loc"/> </rule> </rsc_location> <rsc_location rsc="IPaddr_192_168_1_168" id="cli-prefer-IPaddr_192_168_1_168"> <rule score="INFINITY" id="cli-prefer-rule-IPaddr_192_168_1_168"> <expression attribute="#uname" operation="eq" type="string" id="cli-prefer-expr-IPaddr_192_168_1_168" value="sntest1.dclux.loc"/> </rule> </rsc_location> </constraints> </configuration> </cib> **** Time frame of the test **** * heartbeat running, resource IPaddr_192_168_1_168 is on sntest1. * i do by hand : "ifconfig eth1 down" on sntest1 here are the logs on sntest2 after this : heartbeat[3457]: 2007/06/18_20:49:44 WARN: node 192.168.176.166: is dead heartbeat[3457]: 2007/06/18_20:49:44 WARN: node sntest1.dclux.loc: is dead crmd[3472]: 2007/06/18_20:49:44 notice: crmd_ha_status_callback: Status update: Node 192.168.176.166 now has status [dead] heartbeat[3457]: 2007/06/18_20:49:44 info: Link 192.168.176.166:192.168.176.166 dead. heartbeat[3457]: 2007/06/18_20:49:44 info: Link sntest1.dclux.loc:eth1 dead. crmd[3472]: 2007/06/18_20:49:44 WARN: get_uuid: Could not calculate UUID for 192.168.176.166 crmd[3472]: 2007/06/18_20:49:44 info: crmd_ha_status_callback: Ping node 192.168.176.166 is dead crmd[3472]: 2007/06/18_20:49:44 notice: crmd_ha_status_callback: Status update: Node sntest1.dclux.loc now has status [dead] cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Local-only Change (client:3472, call: 34): 0.1.24 (ok) cib[3693]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version 0.1.24 of the CIB to disk (digest: 4e309facbe71d928e0ec8b6e52a151f0) tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff (cib_update): 0.1.24 -> 0.1.24 tengine[3484]: 2007/06/18_20:49:44 WARN: match_down_event: No match for shutdown action on a3efd7cd-eca4-4ea5-9bba-417225edb077 tengine[3484]: 2007/06/18_20:49:44 info: extract_event: Stonith/shutdown of a3efd7cd-eca4-4ea5-9bba-417225edb077 not matched tengine[3484]: 2007/06/18_20:49:44 info: update_abort_priority: Abort priority upgraded to 1000000 tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Aborting on transient_attributes deletions crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: sntest2.dclux.loc: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ] crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: All 2 cluster nodes are eligable to run resources. pengine[3485]: 2007/06/18_20:49:44 info: log_data_element: process_pe_message: [generation] <cib admin_epoch="0" epoch="1" num_updates="24" node_fencing="yes" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" ccm_transition="2" cib_feature_revision="1.3" dc_uuid="db1c3a49-0f90-424a-8f09-c85af7ed83a8"/> pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value 'stop' for cluster option 'no-quorum-policy' pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value '60s' for cluster option 'cluster-delay' pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max' pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max' pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max' pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value 'true' for cluster option 'startup-fencing' pengine[3485]: 2007/06/18_20:49:44 info: determine_online_status: Node sntest2.dclux.loc is online pengine[3485]: 2007/06/18_20:49:44 info: native_print: IPaddr_192_168_1_168 (heartbeat::ocf:IPaddr): Stopped pengine[3485]: 2007/06/18_20:49:44 notice: StartRsc: sntest2.dclux.loc Start IPaddr_192_168_1_168 pengine[3485]: 2007/06/18_20:49:44 notice: Recurring: sntest2.dclux.loc IPaddr_192_168_1_168_monitor_5000 crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: sntest2.dclux.loc: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] tengine[3484]: 2007/06/18_20:49:44 info: unpack_graph: Unpacked transition 5: 2 actions in 2 synapses tengine[3484]: 2007/06/18_20:49:44 info: send_rsc_command: Initiating action 3: IPaddr_192_168_1_168_start_0 on sntest2.dclux.loc crmd[3472]: 2007/06/18_20:49:44 info: do_lrm_rsc_op: Performing op=IPaddr_192_168_1_168_start_0 key=3:5:be5645bb-9fc6-4331-9bce-6cad3a7d88f1) pengine[3485]: 2007/06/18_20:49:44 info: process_pe_message: Transition 5: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-149.bz2 IPaddr[3694]: 2007/06/18_20:49:44 INFO: Using calculated netmask for 192.168.1.168: 255.255.255.0 IPaddr[3694]: 2007/06/18_20:49:44 DEBUG: Using calculated broadcast for 192.168.1.168: 192.168.1.255 IPaddr[3694]: 2007/06/18_20:49:44 INFO: eval /sbin/ifconfig eth0:0 192.168.1.168 netmask 255.255.255.0 broadcast 192.168.1.255 IPaddr[3694]: 2007/06/18_20:49:44 DEBUG: Sending Gratuitous Arp for 192.168.1.168 on eth0:0 [eth0] crmd[3472]: 2007/06/18_20:49:44 info: process_lrm_event: LRM operation IPaddr_192_168_1_168_start_0 (call=7, rc=0) complete crmd[3472]: 2007/06/18_20:49:44 info: append_restart_list: Resource IPaddr_192_168_1_168 does not support reloads cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Update (client: 3472, call:37): 0.1.24 -> 0.1.25 (ok) tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff (cib_update): 0.1.24 -> 0.1.25 tengine[3484]: 2007/06/18_20:49:44 info: match_graph_event: Action IPaddr_192_168_1_168_start_0 (3) confirmed on db1c3a49-0f90-424a-8f09-c85af7ed83a8 tengine[3484]: 2007/06/18_20:49:44 info: send_rsc_command: Initiating action 4: IPaddr_192_168_1_168_monitor_5000 on sntest2.dclux.loc crmd[3472]: 2007/06/18_20:49:44 info: do_lrm_rsc_op: Performing op=IPaddr_192_168_1_168_monitor_5000 key=4:5:be5645bb-9fc6-4331-9bce-6cad3a7d88f1) cib[3763]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version 0.1.25 of the CIB to disk (digest: c564f9c3ff4c6ebfa2da07a1487e3195) crmd[3472]: 2007/06/18_20:49:44 info: process_lrm_event: LRM operation IPaddr_192_168_1_168_monitor_5000 (call=8, rc=0) complete cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Update (client: 3472, call:38): 0.1.25 -> 0.1.26 (ok) tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff (cib_update): 0.1.25 -> 0.1.26 tengine[3484]: 2007/06/18_20:49:44 info: match_graph_event: Action IPaddr_192_168_1_168_monitor_5000 (4) confirmed on db1c3a49-0f90-424a-8f09-c85af7ed83a8 tengine[3484]: 2007/06/18_20:49:44 info: run_graph: Transition 5: (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0) tengine[3484]: 2007/06/18_20:49:44 info: notify_crmd: Transition 5 status: te_complete - <null> crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: sntest2.dclux.loc: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] cib[3778]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version 0.1.26 of the CIB to disk (digest: b4d9fdb02151cfe83f146108fab56b8c) cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: no mbr_track info cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: no mbr_track info cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=4 cib[3468]: 2007/06/18_20:49:46 info: cib_ccm_msg_callback: LOST: sntest1.dclux.loc cib[3468]: 2007/06/18_20:49:46 info: cib_ccm_msg_callback: PEER: sntest2.dclux.loc crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=4 crmd[3472]: 2007/06/18_20:49:46 info: crmd_ccm_msg_callback: Quorum lost after event=INVALID (id=3) crmd[3472]: 2007/06/18_20:49:46 info: crmd_ccm_msg_callback: Quorum lost: triggering transition (INVALID) crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: INVALID: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=4 crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: CURRENT: sntest2.dclux.loc [nodeid=2, born=3] cib[3468]: 2007/06/18_20:49:46 info: cib_diff_notify: Local-only Change (client:3472, call: 39): 0.1.26 (ok) crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: LOST: sntest1.dclux.loc [nodeid=1, born=2] tengine[3484]: 2007/06/18_20:49:46 info: update_abort_priority: Abort priority upgraded to 1000000 tengine[3484]: 2007/06/18_20:49:46 info: te_update_diff: Processing diff (cib_update): 0.1.26 -> 0.1.26 crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: sntest2.dclux.loc: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ] crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: All 1 cluster nodes are eligable to run resources. cib[3779]: 2007/06/18_20:49:46 info: write_cib_contents: Wrote version 0.1.26 of the CIB to disk (digest: bca029655a297d98a51ac667259e2a29) pengine[3485]: 2007/06/18_20:49:46 info: log_data_element: process_pe_message: [generation] <cib admin_epoch="0" epoch="1" num_updates="26" node_fencing="yes" generated="true" have_quorum="false" ignore_dtd="false" num_peers="2" ccm_transition="3" cib_feature_revision="1.3" dc_uuid="db1c3a49-0f90-424a-8f09-c85af7ed83a8"/> pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value 'stop' for cluster option 'no-quorum-policy' pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value '60s' for cluster option 'cluster-delay' pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max' pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max' pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max' pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value 'true' for cluster option 'startup-fencing' pengine[3485]: 2007/06/18_20:49:46 WARN: cluster_status: We do not have quorum - fencing and resource management disabled pengine[3485]: 2007/06/18_20:49:46 info: determine_online_status: Node sntest2.dclux.loc is online pengine[3485]: 2007/06/18_20:49:46 info: native_print: IPaddr_192_168_1_168 (heartbeat::ocf:IPaddr): Started sntest2.dclux.loc pengine[3485]: 2007/06/18_20:49:46 notice: StopRsc: sntest2.dclux.loc Stop IPaddr_192_168_1_168 crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: sntest2.dclux.loc: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] tengine[3484]: 2007/06/18_20:49:46 info: unpack_graph: Unpacked transition 6: 1 actions in 1 synapses tengine[3484]: 2007/06/18_20:49:46 info: send_rsc_command: Initiating action 4: IPaddr_192_168_1_168_stop_0 on sntest2.dclux.loc crmd[3472]: 2007/06/18_20:49:46 info: do_lrm_rsc_op: Performing op=IPaddr_192_168_1_168_stop_0 key=4:6:be5645bb-9fc6-4331-9bce-6cad3a7d88f1) pengine[3485]: 2007/06/18_20:49:46 info: process_pe_message: Transition 6: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-150.bz2 crmd[3472]: 2007/06/18_20:49:46 WARN: process_lrm_event: LRM operation IPaddr_192_168_1_168_monitor_5000 (call=8, rc=-2) Cancelled lrmd[3469]: 2007/06/18_20:49:46 info: RA output: (IPaddr_192_168_1_168:stop:stderr) SIOCDELRT: No such process IPaddr[3780]: 2007/06/18_20:49:46 INFO: /sbin/ifconfig eth0:0 192.168.1.168 down crmd[3472]: 2007/06/18_20:49:46 info: process_lrm_event: LRM operation IPaddr_192_168_1_168_stop_0 (call=10, rc=0) complete cib[3468]: 2007/06/18_20:49:46 info: cib_diff_notify: Update (client: 3472, call:41): 0.1.26 -> 0.1.27 (ok) tengine[3484]: 2007/06/18_20:49:46 info: te_update_diff: Processing diff (cib_update): 0.1.26 -> 0.1.27 tengine[3484]: 2007/06/18_20:49:46 info: match_graph_event: Action IPaddr_192_168_1_168_stop_0 (4) confirmed on db1c3a49-0f90-424a-8f09-c85af7ed83a8 tengine[3484]: 2007/06/18_20:49:46 info: run_graph: Transition 6: (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0) tengine[3484]: 2007/06/18_20:49:46 info: notify_crmd: Transition 6 status: te_complete - <null> crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: sntest2.dclux.loc: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] cib[3800]: 2007/06/18_20:49:46 info: write_cib_contents: Wrote version 0.1.27 of the CIB to disk (digest: d622380a2238356b807686a901cd1b6c) and then I'm in a state where the resource is nowhere. What am I missing/doing wrong ? Thanks in advance, Benjamin ________________________________ -------------------------------------------------------- This e-mail and any attached files are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail by mistake, please notify the sender immediately and delete it from your system. You must not copy the message or disclose its contents to anyone. -------------------------------------------------------- _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
