Hi, No sadly I didn't see your last message. But good news is, I restarted configuration from scratch using the tutorial from Alan Robertson and everything goes smoothly and I don't have any problem by now.
Thank you for your help. Best regards, Benjamin -----Original Message----- From: Dejan Muhamedagic [mailto:[EMAIL PROTECTED] Sent: jeudi 21 juin 2007 13:41 To: Benjamin KRAFT Cc: [email protected] Subject: Re: [Linux-HA] Ipaddr resource unwanted stop after one node dies Hi, Did you see my reply to your previous message? It was a bit delayed. Cheers, Dejan On Wed, Jun 20, 2007 at 09:19:51AM +0200, Benjamin KRAFT wrote: > Hello everybody, > > > > The main objective of my tests were to be able of moving resources from one > node to one other easily, did it successfully with crm_resource. > > > > The main problem i'm experiencing is that it seems that when i fire the first > node which holds the IPaddr, the node 2 gets the ip BUT releases it > immediatly. > > > > The configuration : > > > > node 1 : sntest1.dclux.loc > > * eth0 : 192.168.1.166/24 > > * eth1 : 192.168.176.166/24 > > > > node 2 : sntest2.dclux.loc > > * eth0 : 192.168.1.167/24 > > * eth1 : 192.168.176.166/24 > > > > Resource : 192.168.1.168/24 as an alias of eth0. > > > > > > /etc/ha.d/ha.cf @sntest1 : > > logfacility local0 > > logfile /var/log/ha-log > > debugfile /var/log/ha-debug > > keepalive 100ms > > deadtime 250ms > > warntime 500ms > > initdead 120 # depend on your hardware > > udpport 694 > > ping 192.168.176.167 > > bcast eth1 > > auto_failback off > > node sntest1.dclux.loc > > node sntest2.dclux.loc > > #use_logd yes > > compression bz2 > > compression_threshold 2 > > crm yes > > > > /etc/ha.d/ha.cf @sntest2 : > > logfacility local0 > > logfile /var/log/ha-log > > debugfile /var/log/ha-debug > > keepalive 100ms > > deadtime 250ms > > warntime 500ms > > initdead 120 # depend on your hardware > > udpport 694 > > ping 192.168.176.166 > > bcast eth1 > > auto_failback off > > node sntest1.dclux.loc > > node sntest2.dclux.loc > > #use_logd yes > > compression bz2 > > compression_threshold 2 > > crm yes > > > > /var/lib/heartbeat/cib.xml : > > <cib admin_epoch="0" node_fencing="yes" have_quorum="false" > ignore_dtd="false" num_peers="2" ccm_transition="3" generated="true" > dc_uuid="a3efd7cd-eca4-4ea5-9bba-417225edb077" epoch="2" > cib_feature_revision="1.3" num_updates="30" cib-last-written="Mon Jun 18 > 20:49:43 2007"> > > <configuration> > > <crm_config> > > <cluster_property_set id="cib-bootstrap-options"> > > <attributes> > > <nvpair id="cib-bootstrap-options-symmetric-cluster" name="symmetric-cluster" > value="true"/> > > <nvpair id="cib-bootstrap-options-no_quorum-policy" name="no_quorum-policy" > value="stop"/> > > <nvpair id="cib-bootstrap-options-default-resource-stickiness" > name="default-resource-stickiness" value="0"/> > > <nvpair id="cib-bootstrap-options-default-resource-failure-stickiness" > name="default-resource-failure-stickiness" value="0"/> > > <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" > value="false"/> > > <nvpair id="cib-bootstrap-options-stonith-action" name="stonith-action" > value="reboot"/> > > <nvpair id="cib-bootstrap-options-stop-orphan-resources" > name="stop-orphan-resources" value="true"/> > > <nvpair id="cib-bootstrap-options-stop-orphan-actions" > name="stop-orphan-actions" value="true"/> > > <nvpair id="cib-bootstrap-options-remove-after-stop" name="remove-after-stop" > value="false"/> > > <nvpair id="cib-bootstrap-options-short-resource-names" > name="short-resource-names" value="true"/> > > <nvpair id="cib-bootstrap-options-transition-idle-timeout" > name="transition-idle-timeout" value="5min"/> > > <nvpair id="cib-bootstrap-options-default-action-timeout" > name="default-action-timeout" value="5s"/> > > <nvpair id="cib-bootstrap-options-is-managed-default" > name="is-managed-default" value="true"/> > > </attributes> > > </cluster_property_set> > > </crm_config> > > <nodes> > > <node id="db1c3a49-0f90-424a-8f09-c85af7ed83a8" uname="sntest2.dclux.loc" > type="normal"/> > > <node id="a3efd7cd-eca4-4ea5-9bba-417225edb077" uname="sntest1.dclux.loc" > type="normal"/> > > </nodes> > > <resources> > > <primitive class="ocf" id="IPaddr_192_168_1_168" provider="heartbeat" > type="IPaddr" failstop_type="stonith"> > > <operations> > > <op id="IPaddr_192_168_1_168_mon" interval="5s" name="monitor" timeout="5s"/> > > </operations> > > <instance_attributes id="IPaddr_192_168_1_168_inst_attr"> > > <attributes> > > <nvpair id="IPaddr_192_168_1_168_attr_0" name="ip" value="192.168.1.168"/> > > <nvpair id="IPaddr_192_168_1_168_attr_1" name="netmask" value="24"/> > > <nvpair id="IPaddr_192_168_1_168_attr_2" name="nic" value="eth0"/> > > </attributes> > > </instance_attributes> > > </primitive> > > </resources> > > <constraints> > > <rsc_location id="rsc_location_IPaddr_192_168_1_168" > rsc="IPaddr_192_168_1_168"> > > <rule id="prefered_location_IPaddr_192_168_1_168" score="100"> > > <expression attribute="#uname" > id="prefered_location_IPaddr_192_168_1_168_expr" operation="eq" > value="sntest1.dclux.loc"/> > > </rule> > > </rsc_location> > > <rsc_location rsc="IPaddr_192_168_1_168" id="cli-prefer-IPaddr_192_168_1_168"> > > <rule score="INFINITY" id="cli-prefer-rule-IPaddr_192_168_1_168"> > > <expression attribute="#uname" operation="eq" type="string" > id="cli-prefer-expr-IPaddr_192_168_1_168" value="sntest1.dclux.loc"/> > > </rule> > > </rsc_location> > > </constraints> > > </configuration> > > </cib> > > > > > > **** Time frame of the test **** > > > > * heartbeat running, resource IPaddr_192_168_1_168 is on sntest1. > > * i do by hand : "ifconfig eth1 down" on sntest1 > > > > here are the logs on sntest2 after this : > > > > heartbeat[3457]: 2007/06/18_20:49:44 WARN: node 192.168.176.166: is dead > > heartbeat[3457]: 2007/06/18_20:49:44 WARN: node sntest1.dclux.loc: is dead > > crmd[3472]: 2007/06/18_20:49:44 notice: crmd_ha_status_callback: Status > update: Node 192.168.176.166 now has status [dead] > > heartbeat[3457]: 2007/06/18_20:49:44 info: Link > 192.168.176.166:192.168.176.166 dead. > > heartbeat[3457]: 2007/06/18_20:49:44 info: Link sntest1.dclux.loc:eth1 dead. > > crmd[3472]: 2007/06/18_20:49:44 WARN: get_uuid: Could not calculate UUID for > 192.168.176.166 > > crmd[3472]: 2007/06/18_20:49:44 info: crmd_ha_status_callback: Ping node > 192.168.176.166 is dead > > crmd[3472]: 2007/06/18_20:49:44 notice: crmd_ha_status_callback: Status > update: Node sntest1.dclux.loc now has status [dead] > > cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Local-only Change > (client:3472, call: 34): 0.1.24 (ok) > > cib[3693]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version 0.1.24 > of the CIB to disk (digest: 4e309facbe71d928e0ec8b6e52a151f0) > > tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff > (cib_update): 0.1.24 -> 0.1.24 > > tengine[3484]: 2007/06/18_20:49:44 WARN: match_down_event: No match for > shutdown action on a3efd7cd-eca4-4ea5-9bba-417225edb077 > > tengine[3484]: 2007/06/18_20:49:44 info: extract_event: Stonith/shutdown of > a3efd7cd-eca4-4ea5-9bba-417225edb077 not matched > > tengine[3484]: 2007/06/18_20:49:44 info: update_abort_priority: Abort > priority upgraded to 1000000 > > tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Aborting on > transient_attributes deletions > > crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: sntest2.dclux.loc: > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_IPC_MESSAGE origin=route_message ] > > crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: All 2 cluster > nodes are eligable to run resources. > > pengine[3485]: 2007/06/18_20:49:44 info: log_data_element: > process_pe_message: [generation] <cib admin_epoch="0" epoch="1" > num_updates="24" node_fencing="yes" generated="true" have_quorum="true" > ignore_dtd="false" num_peers="2" ccm_transition="2" > cib_feature_revision="1.3" dc_uuid="db1c3a49-0f90-424a-8f09-c85af7ed83a8"/> > > pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default > value 'stop' for cluster option 'no-quorum-policy' > > pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default > value '60s' for cluster option 'cluster-delay' > > pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default > value '-1' for cluster option 'pe-error-series-max' > > pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default > value '-1' for cluster option 'pe-warn-series-max' > > pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default > value '-1' for cluster option 'pe-input-series-max' > > pengine[3485]: 2007/06/18_20:49:44 notice: cluster_option: Using default > value 'true' for cluster option 'startup-fencing' > > pengine[3485]: 2007/06/18_20:49:44 info: determine_online_status: Node > sntest2.dclux.loc is online > > pengine[3485]: 2007/06/18_20:49:44 info: native_print: IPaddr_192_168_1_168 > (heartbeat::ocf:IPaddr): Stopped > > pengine[3485]: 2007/06/18_20:49:44 notice: StartRsc: sntest2.dclux.loc Start > IPaddr_192_168_1_168 > > pengine[3485]: 2007/06/18_20:49:44 notice: Recurring: sntest2.dclux.loc > IPaddr_192_168_1_168_monitor_5000 > > crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: sntest2.dclux.loc: > State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=route_message ] > > tengine[3484]: 2007/06/18_20:49:44 info: unpack_graph: Unpacked transition 5: > 2 actions in 2 synapses > > tengine[3484]: 2007/06/18_20:49:44 info: send_rsc_command: Initiating action > 3: IPaddr_192_168_1_168_start_0 on sntest2.dclux.loc > > crmd[3472]: 2007/06/18_20:49:44 info: do_lrm_rsc_op: Performing > op=IPaddr_192_168_1_168_start_0 key=3:5:be5645bb-9fc6-4331-9bce-6cad3a7d88f1) > > pengine[3485]: 2007/06/18_20:49:44 info: process_pe_message: Transition 5: > PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-149.bz2 > > IPaddr[3694]: 2007/06/18_20:49:44 INFO: Using calculated netmask for > 192.168.1.168: 255.255.255.0 > > IPaddr[3694]: 2007/06/18_20:49:44 DEBUG: Using calculated broadcast for > 192.168.1.168: 192.168.1.255 > > IPaddr[3694]: 2007/06/18_20:49:44 INFO: eval /sbin/ifconfig eth0:0 > 192.168.1.168 netmask 255.255.255.0 broadcast 192.168.1.255 > > IPaddr[3694]: 2007/06/18_20:49:44 DEBUG: Sending Gratuitous Arp for > 192.168.1.168 on eth0:0 [eth0] > > crmd[3472]: 2007/06/18_20:49:44 info: process_lrm_event: LRM operation > IPaddr_192_168_1_168_start_0 (call=7, rc=0) complete > > crmd[3472]: 2007/06/18_20:49:44 info: append_restart_list: Resource > IPaddr_192_168_1_168 does not support reloads > > cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Update (client: 3472, > call:37): 0.1.24 -> 0.1.25 (ok) > > tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff > (cib_update): 0.1.24 -> 0.1.25 > > tengine[3484]: 2007/06/18_20:49:44 info: match_graph_event: Action > IPaddr_192_168_1_168_start_0 (3) confirmed on > db1c3a49-0f90-424a-8f09-c85af7ed83a8 > > tengine[3484]: 2007/06/18_20:49:44 info: send_rsc_command: Initiating action > 4: IPaddr_192_168_1_168_monitor_5000 on sntest2.dclux.loc > > crmd[3472]: 2007/06/18_20:49:44 info: do_lrm_rsc_op: Performing > op=IPaddr_192_168_1_168_monitor_5000 > key=4:5:be5645bb-9fc6-4331-9bce-6cad3a7d88f1) > > cib[3763]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version 0.1.25 > of the CIB to disk (digest: c564f9c3ff4c6ebfa2da07a1487e3195) > > crmd[3472]: 2007/06/18_20:49:44 info: process_lrm_event: LRM operation > IPaddr_192_168_1_168_monitor_5000 (call=8, rc=0) complete > > cib[3468]: 2007/06/18_20:49:44 info: cib_diff_notify: Update (client: 3472, > call:38): 0.1.25 -> 0.1.26 (ok) > > tengine[3484]: 2007/06/18_20:49:44 info: te_update_diff: Processing diff > (cib_update): 0.1.25 -> 0.1.26 > > tengine[3484]: 2007/06/18_20:49:44 info: match_graph_event: Action > IPaddr_192_168_1_168_monitor_5000 (4) confirmed on > db1c3a49-0f90-424a-8f09-c85af7ed83a8 > > tengine[3484]: 2007/06/18_20:49:44 info: run_graph: Transition 5: > (Complete=2, Pending=0, Fired=0, Skipped=0, Incomplete=0) > > tengine[3484]: 2007/06/18_20:49:44 info: notify_crmd: Transition 5 status: > te_complete - <null> > > crmd[3472]: 2007/06/18_20:49:44 info: do_state_transition: sntest2.dclux.loc: > State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_IPC_MESSAGE origin=route_message ] > > cib[3778]: 2007/06/18_20:49:44 info: write_cib_contents: Wrote version 0.1.26 > of the CIB to disk (digest: b4d9fdb02151cfe83f146108fab56b8c) > > cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event > OC_EV_MS_INVALID from ccm > > crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event > OC_EV_MS_INVALID from ccm > > cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: no mbr_track info > cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event > OC_EV_MS_INVALID from ccm > > crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: no mbr_track info > > cib[3468]: 2007/06/18_20:49:46 info: mem_handle_event: instance=3, nodes=1, > new=0, lost=1, n_idx=0, new_idx=1, old_idx=4 > > cib[3468]: 2007/06/18_20:49:46 info: cib_ccm_msg_callback: LOST: > sntest1.dclux.loc > > cib[3468]: 2007/06/18_20:49:46 info: cib_ccm_msg_callback: PEER: > sntest2.dclux.loc > > crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: Got an event > OC_EV_MS_INVALID from ccm > > crmd[3472]: 2007/06/18_20:49:46 info: mem_handle_event: instance=3, nodes=1, > new=0, lost=1, n_idx=0, new_idx=1, old_idx=4 > > crmd[3472]: 2007/06/18_20:49:46 info: crmd_ccm_msg_callback: Quorum lost > after event=INVALID (id=3) > > crmd[3472]: 2007/06/18_20:49:46 info: crmd_ccm_msg_callback: Quorum lost: > triggering transition (INVALID) > > crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: INVALID: trans=3, > nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=4 > > crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: CURRENT: > sntest2.dclux.loc [nodeid=2, born=3] > > cib[3468]: 2007/06/18_20:49:46 info: cib_diff_notify: Local-only Change > (client:3472, call: 39): 0.1.26 (ok) > > crmd[3472]: 2007/06/18_20:49:46 info: ccm_event_detail: LOST: > sntest1.dclux.loc [nodeid=1, born=2] > > tengine[3484]: 2007/06/18_20:49:46 info: update_abort_priority: Abort > priority upgraded to 1000000 > > tengine[3484]: 2007/06/18_20:49:46 info: te_update_diff: Processing diff > (cib_update): 0.1.26 -> 0.1.26 > > crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: sntest2.dclux.loc: > State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC > cause=C_IPC_MESSAGE origin=route_message ] > > crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: All 1 cluster > nodes are eligable to run resources. > > cib[3779]: 2007/06/18_20:49:46 info: write_cib_contents: Wrote version 0.1.26 > of the CIB to disk (digest: bca029655a297d98a51ac667259e2a29) > > pengine[3485]: 2007/06/18_20:49:46 info: log_data_element: > process_pe_message: [generation] <cib admin_epoch="0" epoch="1" > num_updates="26" node_fencing="yes" generated="true" have_quorum="false" > ignore_dtd="false" num_peers="2" ccm_transition="3" > cib_feature_revision="1.3" dc_uuid="db1c3a49-0f90-424a-8f09-c85af7ed83a8"/> > > pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default > value 'stop' for cluster option 'no-quorum-policy' > > pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default > value '60s' for cluster option 'cluster-delay' > > pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default > value '-1' for cluster option 'pe-error-series-max' > > pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default > value '-1' for cluster option 'pe-warn-series-max' > > pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default > value '-1' for cluster option 'pe-input-series-max' > > pengine[3485]: 2007/06/18_20:49:46 notice: cluster_option: Using default > value 'true' for cluster option 'startup-fencing' > > pengine[3485]: 2007/06/18_20:49:46 WARN: cluster_status: We do not have > quorum - fencing and resource management disabled > > pengine[3485]: 2007/06/18_20:49:46 info: determine_online_status: Node > sntest2.dclux.loc is online > > pengine[3485]: 2007/06/18_20:49:46 info: native_print: IPaddr_192_168_1_168 > (heartbeat::ocf:IPaddr): Started sntest2.dclux.loc > > pengine[3485]: 2007/06/18_20:49:46 notice: StopRsc: sntest2.dclux.loc Stop > IPaddr_192_168_1_168 > > crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: sntest2.dclux.loc: > State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS > cause=C_IPC_MESSAGE origin=route_message ] > > tengine[3484]: 2007/06/18_20:49:46 info: unpack_graph: Unpacked transition 6: > 1 actions in 1 synapses > > tengine[3484]: 2007/06/18_20:49:46 info: send_rsc_command: Initiating action > 4: IPaddr_192_168_1_168_stop_0 on sntest2.dclux.loc > > crmd[3472]: 2007/06/18_20:49:46 info: do_lrm_rsc_op: Performing > op=IPaddr_192_168_1_168_stop_0 key=4:6:be5645bb-9fc6-4331-9bce-6cad3a7d88f1) > > pengine[3485]: 2007/06/18_20:49:46 info: process_pe_message: Transition 6: > PEngine Input stored in: /var/lib/heartbeat/pengine/pe-input-150.bz2 > > crmd[3472]: 2007/06/18_20:49:46 WARN: process_lrm_event: LRM operation > IPaddr_192_168_1_168_monitor_5000 (call=8, rc=-2) Cancelled > > lrmd[3469]: 2007/06/18_20:49:46 info: RA output: > (IPaddr_192_168_1_168:stop:stderr) SIOCDELRT: No such process > > IPaddr[3780]: 2007/06/18_20:49:46 INFO: /sbin/ifconfig eth0:0 192.168.1.168 > down > > crmd[3472]: 2007/06/18_20:49:46 info: process_lrm_event: LRM operation > IPaddr_192_168_1_168_stop_0 (call=10, rc=0) complete > > cib[3468]: 2007/06/18_20:49:46 info: cib_diff_notify: Update (client: 3472, > call:41): 0.1.26 -> 0.1.27 (ok) > > tengine[3484]: 2007/06/18_20:49:46 info: te_update_diff: Processing diff > (cib_update): 0.1.26 -> 0.1.27 > > tengine[3484]: 2007/06/18_20:49:46 info: match_graph_event: Action > IPaddr_192_168_1_168_stop_0 (4) confirmed on > db1c3a49-0f90-424a-8f09-c85af7ed83a8 > > tengine[3484]: 2007/06/18_20:49:46 info: run_graph: Transition 6: > (Complete=1, Pending=0, Fired=0, Skipped=0, Incomplete=0) > > tengine[3484]: 2007/06/18_20:49:46 info: notify_crmd: Transition 6 status: > te_complete - <null> > > crmd[3472]: 2007/06/18_20:49:46 info: do_state_transition: sntest2.dclux.loc: > State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS > cause=C_IPC_MESSAGE origin=route_message ] > > cib[3800]: 2007/06/18_20:49:46 info: write_cib_contents: Wrote version 0.1.27 > of the CIB to disk (digest: d622380a2238356b807686a901cd1b6c) > > > > and then I'm in a state where the resource is nowhere. > > > > What am I missing/doing wrong ? > > > > Thanks in advance, > > > > Benjamin > > > ________________________________ > -------------------------------------------------------- > > This e-mail and any attached files are confidential and intended solely for > the use of the individual or entity to whom they are addressed. If you have > received this e-mail by mistake, please notify the sender immediately and > delete it from your system. You must not copy the message or disclose its > contents to anyone. > > -------------------------------------------------------- > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Dejan -------------------------------------------------------- This e-mail and any attached files are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail by mistake, please notify the sender immediately and delete it from your system. You must not copy the message or disclose its contents to anyone. -------------------------------------------------------- _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
