Hello everybody,

I'm pretty new with heartbeat and i've got one problem.
The main objective of my tests were to be able of moving resources from one node to one other easily, did it successfully with crm_resource.

The main problem i'm experiencing is that it seems that when i fire the first node which holds the IPaddr, the node 2 gets the ip BUT releases it immediatly.

The configuration :

node 1 : sntest1.dclux.loc
* eth0 : 192.168.1.166/24
* eth1 : 192.168.176.166/24

node 2 : sntest2.dclux.loc
* eth0 : 192.168.1.167/24
* eth1 : 192.168.176.166/24

Resource : 192.168.1.168/24 as an alias of eth0.


/etc/ha.d/ha.cf @sntest1 :
logfacility    local0
logfile        /var/log/ha-log
debugfile /var/log/ha-debug
keepalive 100ms
deadtime 250ms
warntime 500ms
initdead 120 # depend on your hardware
udpport 694
ping 192.168.176.167
bcast eth1
auto_failback off
node    sntest1.dclux.loc
node    sntest2.dclux.loc
#use_logd yes
compression     bz2
compression_threshold 2
crm yes

/etc/ha.d/ha.cf @sntest2 :
logfacility    local0
logfile        /var/log/ha-log
debugfile /var/log/ha-debug
keepalive 100ms
deadtime 250ms
warntime 500ms
initdead 120 # depend on your hardware
udpport 694
ping 192.168.176.166
bcast eth1
auto_failback off
node    sntest1.dclux.loc
node    sntest2.dclux.loc
#use_logd yes
compression     bz2
compression_threshold 2
crm yes

/var/lib/heartbeat/cib.xml :
<cib admin_epoch="0" node_fencing="yes" have_quorum="false" ignore_dtd="false" num_peers="2" ccm_transition="3" generated="true" dc_uuid="a3efd7cd-eca4-4ea5 -9bba-417225edb077" epoch="2" cib_feature_revision="1.3" num_updates="30" cib-last-written="Mon Jun 18 20:49:43 2007">
  <configuration>
    <crm_config>
      <cluster_property_set id="cib-bootstrap-options">
        <attributes>
<nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" value="true"/> <nvpair id="cib-bootstrap-options-no_quorum-policy" name="no_quorum-policy" value="stop"/> <nvpair id="cib-bootstrap-options-default-resource-stickiness" name="default-resource-stickiness" value="0"/> <nvpair id="cib-bootstrap-options-default-resource-failure-stickiness" name="default-resource-failure-stickiness" value="0"/> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/> <nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" value="reboot"/> <nvpair id="cib-bootstrap-options-stop-orphan-resources" name="stop-orphan-resources" value="true"/> <nvpair id="cib-bootstrap-options-stop-orphan-actions" name="stop-orphan-actions" value="true"/> <nvpair id="cib-bootstrap-options-remove-after-stop" name="remove-after-stop" value="false"/> <nvpair id="cib-bootstrap-options-short-resource-names" name="short-resource-names" value="true"/> <nvpair id="cib-bootstrap-options-transition-idle-timeout" name="transition-idle-timeout" value="5min"/> <nvpair id="cib-bootstrap-options-default-action-timeout" name="default-action-timeout" value="5s"/> <nvpair id="cib-bootstrap-options-is-managed-default" name="is-managed-default" value="true"/>
        </attributes>
      </cluster_property_set>
    </crm_config>
    <nodes>
<node id="db1c3a49-0f90-424a-8f09-c85af7ed83a8" uname="sntest2.dclux.loc" type="normal"/> <node id="a3efd7cd-eca4-4ea5-9bba-417225edb077" uname="sntest1.dclux.loc" type="normal"/>
    </nodes>
    <resources>
<primitive class="ocf" id="IPaddr_192_168_1_168" provider="heartbeat" type="IPaddr" failstop_type="stonith">
        <operations>
<op id="IPaddr_192_168_1_168_mon" interval="5s" name="monitor" timeout="5s"/>
        </operations>
        <instance_attributes id="IPaddr_192_168_1_168_inst_attr">
          <attributes>
<nvpair id="IPaddr_192_168_1_168_attr_0" name="ip" value="192.168.1.168"/> <nvpair id="IPaddr_192_168_1_168_attr_1" name="netmask" value="24"/> <nvpair id="IPaddr_192_168_1_168_attr_2" name="nic" value="eth0"/>
          </attributes>
        </instance_attributes>
      </primitive>
    </resources>
    <constraints>
<rsc_location id="rsc_location_IPaddr_192_168_1_168" rsc="IPaddr_192_168_1_168">
        <rule id="prefered_location_IPaddr_192_168_1_168" score="100">
<expression attribute="#uname" id="prefered_location_IPaddr_192_168_1_168_expr" operation="eq" value="sntest1.dclux.loc"/>
        </rule>
      </rsc_location>
<rsc_location rsc="IPaddr_192_168_1_168" id="cli-prefer-IPaddr_192_168_1_168">
        <rule score="INFINITY" id="cli-prefer-rule-IPaddr_192_168_1_168">
<expression attribute="#uname" operation="eq" type="string" id="cli-prefer-expr-IPaddr_192_168_1_168" value="sntest1.dclux.loc"/>
        </rule>
      </rsc_location>
    </constraints>
  </configuration>
</cib>


**** Time frame of the test ****

* heartbeat running, resource IPaddr_192_168_1_168 is on sntest1.
* i do by hand : "ifconfig eth1 down" on sntest1

here are the logs on sntest2 after this :

heartbeat[3457]: 2007/06/18_20:49:44 WARN: node 192.168.176.166: is dead
heartbeat[3457]: 2007/06/18_20:49:44 WARN: node sntest1.dclux.loc: is dead
crmd[3472]: 2007/06/18_20:49:44 notice: crmd_ha_status_callback: Status update: Node 192.168.176.166 now has status [dead] heartbeat[3457]: 2007/06/18_20:49:44 info: Link 192.168.176.166:192.168.176.166 dead.
heartbeat[3457]: 2007/06/18_20:49:44 info: Link sntest1.dclux.loc:eth1 dead.
crmd[3472]: 2007/06/18_20:49:44 WARN: get_uuid: Could not calculate UUID for 192.168.176.166 crmd[3472]: 2007/06/18_20:49:44 info: crmd_ha_status_callback: Ping node 192.168.176.166 is dead crmd[3472]: 2007/06/18_20:49:44 notice: crmd_ha_status_callback: Status update: Node sntest1.dclux.loc now has status [dead] cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Local-only Change (client:3472, call: 34): 0.1.24 (ok) cib[3693]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version 0.1.24 of the CIB to disk (digest: 4e309facbe71d928e0ec8b6e52a151f0) tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff (cib_update): 0.1.24 -> 0.1.24 tengine[3484]: 2007/06/18_20:49:44 WARN: match_down_event: No match for shutdown action on a3efd7cd-eca4-4ea5-9bba-417225edb077 tengine[3484]: 2007/06/18_20:49:44 info: extract_event: Stonith/shutdown of a3efd7cd-eca4-4ea5-9bba-417225edb077 not matched tengine[3484]: 2007/06/18_20:49:44 info: update_abort_priority: Abort priority upgraded to 1000000 tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Aborting on transient_attributes deletions crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: sntest2.dclux.loc: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ] crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: All 2 cluster nodes are eligable to run resources. pengine[3485]: 2007/06/18_20:49:44 info: log_data_element: process_pe_message: [generation] <cib admin_epoch="0" epoch="1" num_updates="24" node_fencing="yes" generated="true" have_quorum="true" ignore_dtd="false" num_peers="2" ccm_transition="2" cib_feature_revision="1.3" dc_uuid="db1c3a49-0f90-424a-8f09-c85af7ed83a8"/> pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value 'stop' for cluster option 'no-quorum-policy' pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value '60s' for cluster option 'cluster-delay' pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max' pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max' pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max' pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default value 'true' for cluster option 'startup-fencing' pengine[3485]: 2007/06/18_20:49:44 info: determine_online_status: Node sntest2.dclux.loc is online pengine[3485]: 2007/06/18_20:49:44 info: native_print: IPaddr_192_168_1_168 (heartbeat::ocf:IPaddr): Stopped pengine[3485]: 2007/06/18_20:49:44 notice: StartRsc: sntest2.dclux.loc Start IPaddr_192_168_1_168 pengine[3485]: 2007/06/18_20:49:44 notice: Recurring: sntest2.dclux.loc IPaddr_192_168_1_168_monitor_5000 crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: sntest2.dclux.loc: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] tengine[3484]: 2007/06/18_20:49:44 info: unpack_graph: Unpacked transition 5: 2 actions in 2 synapses tengine[3484]: 2007/06/18_20:49:44 info: send_rsc_command: Initiating action 3: IPaddr_192_168_1_168_start_0 on sntest2.dclux.loc crmd[3472]: 2007/06/18_20:49:44 info: do_lrm_rsc_op: Performing op=IPaddr_192_168_1_168_start_0 key=3:5:be5645bb-9fc6-4331-9bce-6cad3a7d88f1) pengine[3485]: 2007/06/18_20:49:44 info: process_pe_message: Transition 5: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-149.bz2 IPaddr[3694]: 2007/06/18_20:49:44 INFO: Using calculated netmask for 192.168.1.168: 255.255.255.0 IPaddr[3694]: 2007/06/18_20:49:44 DEBUG: Using calculated broadcast for 192.168.1.168: 192.168.1.255 IPaddr[3694]: 2007/06/18_20:49:44 INFO: eval /sbin/ifconfig eth0:0 192.168.1.168 netmask 255.255.255.0 broadcast 192.168.1.255 IPaddr[3694]: 2007/06/18_20:49:44 DEBUG: Sending Gratuitous Arp for 192.168.1.168 on eth0:0 [eth0] crmd[3472]: 2007/06/18_20:49:44 info: process_lrm_event: LRM operation IPaddr_192_168_1_168_start_0 (call=7, rc=0) complete crmd[3472]: 2007/06/18_20:49:44 info: append_restart_list: Resource IPaddr_192_168_1_168 does not support reloads cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Update (client: 3472, call:37): 0.1.24 -> 0.1.25 (ok) tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff (cib_update): 0.1.24 -> 0.1.25 tengine[3484]: 2007/06/18_20:49:44 info: match_graph_event: Action IPaddr_192_168_1_168_start_0 (3) confirmed on db1c3a49-0f90-424a-8f09-c85af7ed83a8 tengine[3484]: 2007/06/18_20:49:44 info: send_rsc_command: Initiating action 4: IPaddr_192_168_1_168_monitor_5000 on sntest2.dclux.loc crmd[3472]: 2007/06/18_20:49:44 info: do_lrm_rsc_op: Performing op=IPaddr_192_168_1_168_monitor_5000 key=4:5:be5645bb-9fc6-4331-9bce-6cad3a7d88f1) cib[3763]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version 0.1.25 of the CIB to disk (digest: c564f9c3ff4c6ebfa2da07a1487e3195) crmd[3472]: 2007/06/18_20:49:44 info: process_lrm_event: LRM operation IPaddr_192_168_1_168_monitor_5000 (call=8, rc=0) complete cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Update (client: 3472, call:38): 0.1.25 -> 0.1.26 (ok) tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff (cib_update): 0.1.25 -> 0.1.26 tengine[3484]: 2007/06/18_20:49:44 info: match_graph_event: Action IPaddr_192_168_1_168_monitor_5000 (4) confirmed on db1c3a49-0f90-424a-8f09-c85af7ed83a8 tengine[3484]: 2007/06/18_20:49:44 info: run_graph: Transition 5: (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0) tengine[3484]: 2007/06/18_20:49:44 info: notify_crmd: Transition 5 status: te_complete - <null> crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: sntest2.dclux.loc: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] cib[3778]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version 0.1.26 of the CIB to disk (digest: b4d9fdb02151cfe83f146108fab56b8c) cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: no mbr_track info
cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm
crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: no mbr_track info
cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=4 cib[3468]: 2007/06/18_20:49:46 info: cib_ccm_msg_callback: LOST: sntest1.dclux.loc cib[3468]: 2007/06/18_20:49:46 info: cib_ccm_msg_callback: PEER: sntest2.dclux.loc crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event OC_EV_MS_INVALID from ccm crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: instance=3, nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=4 crmd[3472]: 2007/06/18_20:49:46 info: crmd_ccm_msg_callback: Quorum lost after event=INVALID (id=3) crmd[3472]: 2007/06/18_20:49:46 info: crmd_ccm_msg_callback: Quorum lost: triggering transition (INVALID) crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: INVALID: trans=3, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=4 crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: CURRENT: sntest2.dclux.loc [nodeid=2, born=3] cib[3468]: 2007/06/18_20:49:46 info: cib_diff_notify: Local-only Change (client:3472, call: 39): 0.1.26 (ok) crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: LOST: sntest1.dclux.loc [nodeid=1, born=2] tengine[3484]: 2007/06/18_20:49:46 info: update_abort_priority: Abort priority upgraded to 1000000 tengine[3484]: 2007/06/18_20:49:46 info: te_update_diff: Processing diff (cib_update): 0.1.26 -> 0.1.26 crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: sntest2.dclux.loc: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_IPC_MESSAGE origin=route_message ] crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: All 1 cluster nodes are eligable to run resources. cib[3779]: 2007/06/18_20:49:46 info: write_cib_contents: Wrote version 0.1.26 of the CIB to disk (digest: bca029655a297d98a51ac667259e2a29) pengine[3485]: 2007/06/18_20:49:46 info: log_data_element: process_pe_message: [generation] <cib admin_epoch="0" epoch="1" num_updates="26" node_fencing="yes" generated="true" have_quorum="false" ignore_dtd="false" num_peers="2" ccm_transition="3" cib_feature_revision="1.3" dc_uuid="db1c3a49-0f90-424a-8f09-c85af7ed83a8"/> pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value 'stop' for cluster option 'no-quorum-policy' pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value '60s' for cluster option 'cluster-delay' pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value '-1' for cluster option 'pe-error-series-max' pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value '-1' for cluster option 'pe-warn-series-max' pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value '-1' for cluster option 'pe-input-series-max' pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default value 'true' for cluster option 'startup-fencing' pengine[3485]: 2007/06/18_20:49:46 WARN: cluster_status: We do not have quorum - fencing and resource management disabled pengine[3485]: 2007/06/18_20:49:46 info: determine_online_status: Node sntest2.dclux.loc is online pengine[3485]: 2007/06/18_20:49:46 info: native_print: IPaddr_192_168_1_168 (heartbeat::ocf:IPaddr): Started sntest2.dclux.loc pengine[3485]: 2007/06/18_20:49:46 notice: StopRsc: sntest2.dclux.loc Stop IPaddr_192_168_1_168 crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: sntest2.dclux.loc: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] tengine[3484]: 2007/06/18_20:49:46 info: unpack_graph: Unpacked transition 6: 1 actions in 1 synapses tengine[3484]: 2007/06/18_20:49:46 info: send_rsc_command: Initiating action 4: IPaddr_192_168_1_168_stop_0 on sntest2.dclux.loc crmd[3472]: 2007/06/18_20:49:46 info: do_lrm_rsc_op: Performing op=IPaddr_192_168_1_168_stop_0 key=4:6:be5645bb-9fc6-4331-9bce-6cad3a7d88f1) pengine[3485]: 2007/06/18_20:49:46 info: process_pe_message: Transition 6: PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-150.bz2 crmd[3472]: 2007/06/18_20:49:46 WARN: process_lrm_event: LRM operation IPaddr_192_168_1_168_monitor_5000 (call=8, rc=-2) Cancelled lrmd[3469]: 2007/06/18_20:49:46 info: RA output: (IPaddr_192_168_1_168:stop:stderr) SIOCDELRT: No such process

IPaddr[3780]: 2007/06/18_20:49:46 INFO: /sbin/ifconfig eth0:0 192.168.1.168 down crmd[3472]: 2007/06/18_20:49:46 info: process_lrm_event: LRM operation IPaddr_192_168_1_168_stop_0 (call=10, rc=0) complete cib[3468]: 2007/06/18_20:49:46 info: cib_diff_notify: Update (client: 3472, call:41): 0.1.26 -> 0.1.27 (ok) tengine[3484]: 2007/06/18_20:49:46 info: te_update_diff: Processing diff (cib_update): 0.1.26 -> 0.1.27 tengine[3484]: 2007/06/18_20:49:46 info: match_graph_event: Action IPaddr_192_168_1_168_stop_0 (4) confirmed on db1c3a49-0f90-424a-8f09-c85af7ed83a8 tengine[3484]: 2007/06/18_20:49:46 info: run_graph: Transition 6: (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0) tengine[3484]: 2007/06/18_20:49:46 info: notify_crmd: Transition 6 status: te_complete - <null> crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: sntest2.dclux.loc: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_IPC_MESSAGE origin=route_message ] cib[3800]: 2007/06/18_20:49:46 info: write_cib_contents: Wrote version 0.1.27 of the CIB to disk (digest: d622380a2238356b807686a901cd1b6c)

and then ... nothing.
sntest2 doesn't get the resource.

What am I missing ?

Thanks in advance,

Benjamin
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to