Hello mike, Your problem is pretty simple. You simply don't have configured IPaddr resource within pacemaker.
Please look here for IPaddr: http://www.linux-ha.org/doc/re-ra-IPaddr.html And here for IPaddr2: http://www.linux-ha.org/doc/re-ra-IPaddr2.html You should remove the haresources file and configure pacemaker so it can handle your IP address. You can find basic configurations here: http://clusterlabs.org/wiki/Example_configurations http://clusterlabs.org/wiki/Example_XML_configurations Best regards, Marian On Monday 22 March 2010 01:25:58 mike wrote: > Hi Marian - I have included my cib.xml file below. > What I have found tonight is that by commenting out the crm entry in the > ha.cf and re-enabling the haresources file, I am able to fail the ip > back and forth at will. Here is what my haresources file looks like: > *DBSUAT1A.intranet.mydomain.com IPaddr::172.28.185.49* > > My cib.xml file which was generated from the above haresources file > using /usr/lib64/heartbeat/haresources2cib.py > [r...@dbsuat1b support]# cat cib.xml > <cib admin_epoch="0" epoch="7" validate-with="transitional-0.6" > crm_feature_set="3.0.1" have-quorum="1" > dc-uuid="e99889ee-da15-4b09-bfc7-641e3ac0687f" num_updates="0" > cib-last-written="Sun Mar 21 19:10:03 2010"> > <configuration> > <crm_config> > <cluster_property_set id="cib-bootstrap-options"> > <attributes> > <nvpair id="cib-bootstrap-options-symmetric-cluster" > name="symmetric-cluster" value="true"/> > <nvpair id="cib-bootstrap-options-no-quorum-policy" > name="no-quorum-policy" value="stop"/> > <nvpair id="cib-bootstrap-options-default-resource-stickiness" > name="default-resource-stickiness" value="0"/> > <nvpair > id="cib-bootstrap-options-default-resource-failure-stickiness" > name="default-resource-failure-stickiness" value="0"/> > <nvpair id="cib-bootstrap-options-stonith-enabled" > name="stonith-enabled" value="false"/> > <nvpair id="cib-bootstrap-options-stonith-action" > name="stonith-action" value="reboot"/> > <nvpair id="cib-bootstrap-options-startup-fencing" > name="startup-fencing" value="true"/> > <nvpair id="cib-bootstrap-options-stop-orphan-resources" > name="stop-orphan-resources" value="true"/> > <nvpair id="cib-bootstrap-options-stop-orphan-actions" > name="stop-orphan-actions" value="true"/> > <nvpair id="cib-bootstrap-options-remove-after-stop" > name="remove-after-stop" value="false"/> > <nvpair id="cib-bootstrap-options-short-resource-names" > name="short-resource-names" value="true"/> > <nvpair id="cib-bootstrap-options-transition-idle-timeout" > name="transition-idle-timeout" value="5min"/> > <nvpair id="cib-bootstrap-options-default-action-timeout" > name="default-action-timeout" value="20s"/> > <nvpair id="cib-bootstrap-options-is-managed-default" > name="is-managed-default" value="true"/> > <nvpair id="cib-bootstrap-options-cluster-delay" > name="cluster-delay" value="60s"/> > <nvpair id="cib-bootstrap-options-pe-error-series-max" > name="pe-error-series-max" value="-1"/> > <nvpair id="cib-bootstrap-options-pe-warn-series-max" > name="pe-warn-series-max" value="-1"/> > <nvpair id="cib-bootstrap-options-pe-input-series-max" > name="pe-input-series-max" value="-1"/> > <nvpair id="cib-bootstrap-options-dc-version" > name="dc-version" value="1.0.6-17fe0022afda074a937d934b3eb625eccd1f01ef"/> > <nvpair id="cib-bootstrap-options-cluster-infrastructure" > name="cluster-infrastructure" value="Heartbeat"/> > </attributes> > </cluster_property_set> > </crm_config> > <nodes> > <node id="db80324b-c9de-4995-a66a-eedf93abb42c" > uname="dbsuat1a.intranet.mydomain.com" type="normal"/> > <node id="e99889ee-da15-4b09-bfc7-641e3ac0687f" > uname="dbsuat1b.intranet.mydomain.com" type="normal"/> > </nodes> > <resources> > <primitive class="ocf" id="IPaddr_172_28_185_49" > provider="heartbeat" type="IPaddr"> > <operations> > <op id="IPaddr_172_28_185_49_mon" interval="5s" name="monitor" > timeout="5s"/> > </operations> > <instance_attributes id="IPaddr_172_28_185_49_inst_attr"> > <attributes> > <nvpair id="IPaddr_172_28_185_49_attr_0" name="ip" > value="172.28.185.49"/> > </attributes> > </instance_attributes> > </primitive> > </resources> > <constraints> > <rsc_location id="rsc_location_IPaddr_172_28_185_49" > rsc="IPaddr_172_28_185_49"> > <rule id="prefered_location_IPaddr_172_28_185_49" score="100"> > <expression attribute="#uname" > id="prefered_location_IPaddr_172_28_185_49_expr" operation="eq" > value="DBSUAT1A.intranet.mydomain.com"/> > </rule> > </rsc_location> > </constraints> > </configuration> > </cib> > > Marian Marinov wrote: > > Can you please give us your crm configuration ? > > > > Marian > > > > On Sunday 21 March 2010 23:30:46 mike wrote: > >> Thank you Marian. I removed th efile as you suggested but unfortunately > >> it has made no difference. The ip address is simply not being released > >> when I stop the heartbeat process. > >> > >> Anyone have an ideas where I could start to look at this? The only way I > >> can get the ip address released is to reboot the node. > >> > >> thanks > >> > >> Marian Marinov wrote: > >>> On Saturday 20 March 2010 03:56:27 mike wrote: > >>>> Hi guys, > >>>> > >>>> I have a simple 2 node cluster with a VIP running on RHEL 5.3 on s390. > >>>> Nothing else configured yet. > >>>> > >>>> When I start up the cluster, all is well. The VIP starts up on the > >>>> home node and crm_mon shows the resource and nodes as on line. No > >>>> errors in the logs. > >>>> > >>>> If I issue service heartbeat stop on the main node, the ip fails over > >>>> to the back up node and crm_mon shows as I would expect it should, > >>>> i.e. the ip address is on the back up node and that the other node is > >>>> offline. However, if I do a ifconfig on the main node I see that the > >>>> eth0:0 entry is still there so in effect the vip address is now on > >>>> both servers. > >>>> > >>>> If both nodes were up and running and I rebooted the main node then > >>>> the failover works perfectly. > >>>> > >>>> Would anyone know why the nodes seem unable to release the vip unless > >>>> rebooted? > >>>> > >>>> ha-log: > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for shutdown > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> notice: LogActions: Move resource IPaddr_172_28_185_49 (Started > >>>> dbsuat1a.intranet.mydomain.com -> dbsuat1b.intranet.mydomain.com) > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_state_transition: State transition S_POLICY_ENGINE -> > >>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE > >>>> origin=handle_response ] > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> process_pe_message: Transition 5: PEngine Input stored in: > >>>> /usr/var/lib/pengine/pe-input-337.bz2 > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> process_pe_message: Configuration WARNINGs found during PE processing. > >>>> Please run "crm_verify -L" to identify issues. > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> unpack_graph: Unpacked transition 5: 5 actions in 5 synapses > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_te_invoke: Processing graph 5 (ref=pe_calc-dc-1269021312-26) > >>>> derived from /usr/var/lib/pengine/pe-input-337.bz2 > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> te_rsc_command: Initiating action 6: stop IPaddr_172_28_185_49_stop_0 > >>>> on dbsuat1a.intranet.mydomain.com (local) > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_lrm_rsc_op: Performing > >>>> key=6:5:0:888fa84e-3267-409e-966b-2ab01e579c0f > >>>> op=IPaddr_172_28_185_49_stop_0 ) > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: info: > >>>> rsc:IPaddr_172_28_185_49:5: stop > >>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> process_lrm_event: LRM operation IPaddr_172_28_185_49_monitor_5000 > >>>> (call=4, status=1, cib-update=0, confirmed=true) Cancelled > >>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: > >>>> IPaddr_172_28_185_49:stop process (PID 5474) timed out (try 1). > >>>> Killing with signal SIGTERM (15). > >>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: > >>>> Managed IPaddr_172_28_185_49:stop process 5474 killed by signal 15 > >>>> [SIGTERM - Termination (ANSI)]. > >>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: > >>>> operation stop[5] on ocf::IPaddr::IPaddr_172_28_185_49 for client > >>>> 4531, its parameters: ip=[172.28.185.49] CRM_meta_timeout=[20000] > >>>> crm_feature_set=[3.0.1] : pid [5474] timed out > >>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com crmd: [4531]: ERROR: > >>>> process_lrm_event: LRM operation IPaddr_172_28_185_49_stop_0 (5) Timed > >>>> Out (timeout=20000ms) > >>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: > >>>> status_from_rc: Action 6 (IPaddr_172_28_185_49_stop_0) on > >>>> dbsuat1a.intranet.mydomain.com failed (target: 0 vs. rc: -2): Error > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: > >>>> update_failcount: Updating failcount for IPaddr_172_28_185_49 on > >>>> dbsuat1a.intranet.mydomain.com after failed stop: rc=-2 > >>>> (update=INFINITY, time=1269021333) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> abort_transition_graph: match_graph_event:272 - Triggered transition > >>>> abort (complete=0, tag=lrm_rsc_op, id=IPaddr_172_28_185_49_stop_0, > >>>> magic=2:-2;6:5:0:888fa84e-3267-409e-966b-2ab01e579c0f, cib=0.23.16) : > >>>> Event failed > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> update_abort_priority: Abort priority upgraded from 0 to 1 > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> update_abort_priority: Abort action done superceeded by restart > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> match_graph_event: Action IPaddr_172_28_185_49_stop_0 (6) confirmed on > >>>> dbsuat1a.intranet.mydomain.com (rc=4) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> run_graph: ==================================================== > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: > >>>> run_graph: Transition 5 (Complete=1, Pending=0, Fired=0, Skipped=4, > >>>> Incomplete=0, Source=/usr/var/lib/pengine/pe-input-337.bz2): Stopped > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> te_graph_trigger: Transition 5 is now complete > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>> find_hash_entry: Creating hash entry for > >>>> fail-count-IPaddr_172_28_185_49 Mar 19 13:55:33 > >>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_state_transition: State transition S_TRANSITION_ENGINE -> > >>>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > >>>> origin=notify_crmd ] Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com > >>>> crmd: [4531]: info: do_state_transition: All 2 cluster nodes are > >>>> eligible to run resources. Mar 19 13:55:33 > >>>> DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>> attrd_trigger_update: Sending flush op to all hosts for: > >>>> fail-count-IPaddr_172_28_185_49 (INFINITY) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>> attrd_perform_update: Sent update 24: > >>>> fail-count-IPaddr_172_28_185_49=INFINITY > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_pe_invoke: Query 53: Requesting the current CIB: S_POLICY_ENGINE > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>> find_hash_entry: Creating hash entry for > >>>> last-failure-IPaddr_172_28_185_49 Mar 19 13:55:33 > >>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> abort_transition_graph: te_update_diff:146 - Triggered transition > >>>> abort (complete=1, tag=transient_attributes, > >>>> id=db80324b-c9de-4995-a66a-eedf93abb42c, magic=NA, cib=0.23.17) : > >>>> Transient attribute: update > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>> attrd_trigger_update: Sending flush op to all hosts for: > >>>> last-failure-IPaddr_172_28_185_49 (1269021333) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>> attrd_perform_update: Sent update 27: > >>>> last-failure-IPaddr_172_28_185_49=1269021333 > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1269021333-28, > >>>> seq=2, quorate=1 > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> abort_transition_graph: te_update_diff:146 - Triggered transition > >>>> abort (complete=1, tag=transient_attributes, > >>>> id=db80324b-c9de-4995-a66a-eedf93abb42c, magic=NA, cib=0.23.18) : > >>>> Transient attribute: update > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> notice: update_validation: Upgrading transitional-0.6-style > >>>> configuration to pacemaker-1.0 with /usr/share/pacemaker/upgrade06.xsl > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> update_validation: Transformation /usr/share/pacemaker/upgrade06.xsl > >>>> successful > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> notice: update_validation: Upgraded from transitional-0.6 to > >>>> pacemaker-1.0 validation > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: > >>>> cli_config_update: Your configuration was internally updated to the > >>>> latest version (pacemaker-1.0) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_pe_invoke: Query 54: Requesting the current CIB: S_POLICY_ENGINE > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_pe_invoke: Query 55: Requesting the current CIB: S_POLICY_ENGINE > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1269021333-29, > >>>> seq=2, quorate=1 > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = > >>>> 0 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> info: determine_online_status: Node dbsuat1a.intranet.mydomain.com is > >>>> shutting down > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: > >>>> unpack_rsc_op: Processing failed op IPaddr_172_28_185_49_stop_0 on > >>>> dbsuat1a.intranet.mydomain.com: unknown exec error (-2) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> native_add_running: resource IPaddr_172_28_185_49 isnt managed > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> determine_online_status: Node dbsuat1b.intranet.mydomain.com is online > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> notice: native_print: IPaddr_172_28_185_49 (ocf::heartbeat:IPaddr): > >>>> Started dbsuat1a.intranet.mydomain.com (unmanaged) FAILED > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> get_failcount: IPaddr_172_28_185_49 has failed 1000000 times on > >>>> dbsuat1a.intranet.mydomain.com > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: > >>>> common_apply_stickiness: Forcing IPaddr_172_28_185_49 away from > >>>> dbsuat1a.intranet.mydomain.com after 1000000 failures (max=1000000) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> native_color: Unmanaged resource IPaddr_172_28_185_49 allocated to > >>>> 'nowhere': failed > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for shutdown > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> notice: LogActions: Leave resource IPaddr_172_28_185_49 (Started > >>>> unmanaged) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: > >>>> [4531]: info: handle_response: pe_calc calculation > >>>> pe_calc-dc-1269021333-28 is obsolete Mar 19 13:55:33 > >>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> process_pe_message: Transition 6: PEngine Input stored in: > >>>> /usr/var/lib/pengine/pe-input-338.bz2 > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> process_pe_message: Configuration WARNINGs found during PE processing. > >>>> Please run "crm_verify -L" to identify issues. > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> notice: update_validation: Upgrading transitional-0.6-style > >>>> configuration to pacemaker-1.0 with /usr/share/pacemaker/upgrade06.xsl > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> update_validation: Transformation /usr/share/pacemaker/upgrade06.xsl > >>>> successful > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> notice: update_validation: Upgraded from transitional-0.6 to > >>>> pacemaker-1.0 validation > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: > >>>> cli_config_update: Your configuration was internally updated to the > >>>> latest version (pacemaker-1.0) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = > >>>> 0 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> info: determine_online_status: Node dbsuat1a.intranet.mydomain.com is > >>>> shutting down > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: > >>>> unpack_rsc_op: Processing failed op IPaddr_172_28_185_49_stop_0 on > >>>> dbsuat1a.intranet.mydomain.com: unknown exec error (-2) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> native_add_running: resource IPaddr_172_28_185_49 isnt managed > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> determine_online_status: Node dbsuat1b.intranet.mydomain.com is online > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> notice: native_print: IPaddr_172_28_185_49 (ocf::heartbeat:IPaddr): > >>>> Started dbsuat1a.intranet.mydomain.com (unmanaged) FAILED > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> get_failcount: IPaddr_172_28_185_49 has failed 1000000 times on > >>>> dbsuat1a.intranet.mydomain.com > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: > >>>> common_apply_stickiness: Forcing IPaddr_172_28_185_49 away from > >>>> dbsuat1a.intranet.mydomain.com after 1000000 failures (max=1000000) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> native_color: Unmanaged resource IPaddr_172_28_185_49 allocated to > >>>> 'nowhere': failed > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for shutdown > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>> notice: LogActions: Leave resource IPaddr_172_28_185_49 (Started > >>>> unmanaged) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: > >>>> [4714]: info: process_pe_message: Transition 7: PEngine Input stored > >>>> in: > >>>> /usr/var/lib/pengine/pe-input-339.bz2 > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_state_transition: State transition S_POLICY_ENGINE -> > >>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE > >>>> origin=handle_response ] > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> process_pe_message: Configuration WARNINGs found during PE processing. > >>>> Please run "crm_verify -L" to identify issues. > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> unpack_graph: Unpacked transition 7: 1 actions in 1 synapses > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_te_invoke: Processing graph 7 (ref=pe_calc-dc-1269021333-29) > >>>> derived from /usr/var/lib/pengine/pe-input-339.bz2 > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> te_crm_command: Executing crm-event (10): do_shutdown on > >>>> dbsuat1a.intranet.mydomain.com > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> te_crm_command: crm-event (10) is a local shutdown > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> run_graph: ==================================================== > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: > >>>> run_graph: Transition 7 (Complete=1, Pending=0, Fired=0, Skipped=0, > >>>> Incomplete=0, Source=/usr/var/lib/pengine/pe-input-339.bz2): Complete > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> te_graph_trigger: Transition 7 is now complete > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_state_transition: State transition S_TRANSITION_ENGINE -> > >>>> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ] > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_dc_release: DC role released > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> stop_subsystem: Sent -TERM to pengine: [4714] > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_te_control: Transitioner is now inactive > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>> crm_signal_dispatch: Invoking handler for signal 15: Terminated > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_te_control: Disconnecting STONITH... > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> tengine_stonith_connection_destroy: Fencing daemon disconnected > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: > >>>> Not currently connected. > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_shutdown: Terminating the pengine > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> stop_subsystem: Sent -TERM to pengine: [4714] > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_shutdown: Waiting for subsystems to exit > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: > >>>> register_fsa_input_adv: do_shutdown stalled the FSA with pending > >>>> inputs Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: > >>>> info: do_shutdown: All subsystems stopped, continuing > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: > >>>> do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in > >>>> state S_STOPPING > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_shutdown: Terminating the pengine > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> stop_subsystem: Sent -TERM to pengine: [4714] > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_shutdown: Waiting for subsystems to exit > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_shutdown: All subsystems stopped, continuing > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> crmdManagedChildDied: Process pengine:[4714] exited (signal=0, > >>>> exitcode=0) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: > >>>> [4531]: info: pe_msg_dispatch: Received HUP from pengine:[4714] > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> pe_connection_destroy: Connection to the Policy Engine released > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_shutdown: All subsystems stopped, continuing > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: ERROR: > >>>> verify_stopped: Resource IPaddr_172_28_185_49 was active at shutdown. > >>>> You may ignore this error if it is unmanaged. > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_lrm_control: Disconnected from the LRM > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info: > >>>> client (pid=4531) removed from ccm > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_ha_control: Disconnected from Heartbeat > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_cib_control: Disconnecting CIB > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>> cib_process_readwrite: We are now in R/O mode > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> crmd_cib_connection_destroy: Connection to the CIB terminated... > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> free_mem: Dropping I_TERMINATE: [ state=S_STOPPING > >>>> cause=C_FSA_INTERNAL origin=do_stop ] > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>> do_exit: [crmd] stopped (0) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: killing /usr/lib64/heartbeat/attrd process group 4530 with > >>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: > >>>> [4530]: info: crm_signal_dispatch: Invoking handler for signal 15: > >>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: > >>>> [4530]: info: attrd_shutdown: Exiting > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>> main: Exiting... > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>> attrd_cib_connection_destroy: Connection to the CIB terminated... > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: killing /usr/lib64/heartbeat/stonithd process group 4529 with > >>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com stonithd: > >>>> [4529]: notice: /usr/lib64/heartbeat/stonithd normally quit. > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: killing /usr/lib64/heartbeat/lrmd -r process group 4528 with > >>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: > >>>> info: lrmd is shutting down > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: > >>>> resource IPaddr_172_28_185_49 is left in UNKNOWN status.(last op stop > >>>> finished without LRM_OP_DONE status.) > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: killing /usr/lib64/heartbeat/cib process group 4527 with signal > >>>> 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>> crm_signal_dispatch: Invoking handler for signal 15: Terminated Mar 19 > >>>> 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>> cib_shutdown: Disconnected 0 clients > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>> cib_process_disconnect: All clients disconnected... > >>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>> initiate_exit: Sending disconnect notification to 2 peers... > >>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>> cib_process_shutdown_req: Shutdown ACK from > >>>> dbsuat1b.intranet.mydomain.com Mar 19 13:55:34 > >>>> DBSUAT1A.intranet.mydomain.com cib: [4527]: info: terminate_cib: > >>>> cib_process_shutdown_req: Disconnecting heartbeat Mar 19 13:55:34 > >>>> DBSUAT1A.intranet.mydomain.com cib: [4527]: info: terminate_cib: > >>>> Exiting... > >>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>> cib_process_request: Operation complete: op cib_shutdown_req for > >>>> section 'all' > >>>> (origin=dbsuat1b.intranet.mydomain.com/dbsuat1b.intranet.mydomain.com/ > >>>>(n ull ), version=0.0.0): ok (rc=0) > >>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>> ha_msg_dispatch: Lost connection to heartbeat service. > >>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>> main: Done > >>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info: > >>>> client (pid=4527) removed from ccm > >>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: killing /usr/lib64/heartbeat/ccm process group 4526 with signal > >>>> 15 Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info: > >>>> received SIGTERM, going to shut down > >>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: killing HBFIFO process 4522 with signal 15 > >>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: killing HBWRITE process 4523 with signal 15 > >>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: killing HBREAD process 4524 with signal 15 > >>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: Core process 4524 exited. 3 remaining > >>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: Core process 4523 exited. 2 remaining > >>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: Core process 4522 exited. 1 remaining > >>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>> info: dbsuat1a.intranet.mydomain.com Heartbeat shutdown complete. > >>>> > >>>> > >>>> my ha.cf > >>>> # Logging > >>>> debug 1 > >>>> debugfile /var/log/ha-debug > >>>> logfile /var/log/ha-log > >>>> logfacility local0 > >>>> #use_logd true > >>>> #logfacility daemon > >>>> > >>>> # Misc Options > >>>> traditional_compression off > >>>> compression bz2 > >>>> coredumps true > >>>> > >>>> # Communications > >>>> udpport 691 > >>>> bcast eth0 > >>>> ##autojoin any > >>>> autojoin none > >>>> > >>>> # Thresholds (in seconds) > >>>> keepalive 1 > >>>> warntime 6 > >>>> deadtime 10 > >>>> initdead 15 > >>>> > >>>> node dbsuat1a.intranet.mydomain.com > >>>> node dbsuat1b.intranet.mydomain.com > >>>> #enable pacemaker > >>>> crm yes > >>>> #enable STONITH > >>>> #crm respawn > >>>> > >>>> my haresources: > >>>> DBSUAT1A.intranet.mydomain.com 172.28.185.49 > >>> > >>> I don't have a very good advice, but you shouldn't use haresources > >>> anymore. You should use pacemaker for configuring the cluster. > >>> > >>> You have said that you wish to use pacemaker(crm) with this line of > >>> your config: crm yes > >>> > >>> Remove the haresources file, restart the heartbeat on both nodes and > >>> redo the tests. > >>> > >>> > >>> ----------------------------------------------------------------------- > >>>- > >>> > >>> _______________________________________________ > >>> Linux-HA mailing list > >>> [email protected] > >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>> See also: http://linux-ha.org/ReportingProblems > >> > >> _______________________________________________ > >> Linux-HA mailing list > >> [email protected] > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Best regards, Marian Marinov
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
