Hi Marian, Success!!!! I changed my haresources file to look like this: DBSUAT1A.intranet.mydomain.com IPaddr2::172.28.185.49 (so I'm using IPaddr2 now and not IPaddr)
I removed the old cib.xml and cib.xml.sig. Generated a new cib.xml with /usr/lib64/heartbeat/haresources2cib.py. Removed the haresources and enabled crm again in ha.cf and the cluster works flawlessly. It fails back and forth as it should. The only issue is that ifconfig does not reveal the ip address but instead I have to use: ip addr show eth0 Now the question is of course - why does IPaddr2 work where IPaddr doesn't? That's what I need to figure out..... Marian Marinov wrote: > Hello mike, > > Your problem is pretty simple. You simply don't have configured IPaddr > resource > within pacemaker. > > Please look here for IPaddr: > http://www.linux-ha.org/doc/re-ra-IPaddr.html > And here for IPaddr2: > http://www.linux-ha.org/doc/re-ra-IPaddr2.html > > You should remove the haresources file and configure pacemaker so it can > handle > your IP address. > > You can find basic configurations here: > http://clusterlabs.org/wiki/Example_configurations > http://clusterlabs.org/wiki/Example_XML_configurations > > Best regards, > Marian > > On Monday 22 March 2010 01:25:58 mike wrote: > >> Hi Marian - I have included my cib.xml file below. >> What I have found tonight is that by commenting out the crm entry in the >> ha.cf and re-enabling the haresources file, I am able to fail the ip >> back and forth at will. Here is what my haresources file looks like: >> *DBSUAT1A.intranet.mydomain.com IPaddr::172.28.185.49* >> >> My cib.xml file which was generated from the above haresources file >> using /usr/lib64/heartbeat/haresources2cib.py >> [r...@dbsuat1b support]# cat cib.xml >> <cib admin_epoch="0" epoch="7" validate-with="transitional-0.6" >> crm_feature_set="3.0.1" have-quorum="1" >> dc-uuid="e99889ee-da15-4b09-bfc7-641e3ac0687f" num_updates="0" >> cib-last-written="Sun Mar 21 19:10:03 2010"> >> <configuration> >> <crm_config> >> <cluster_property_set id="cib-bootstrap-options"> >> <attributes> >> <nvpair id="cib-bootstrap-options-symmetric-cluster" >> name="symmetric-cluster" value="true"/> >> <nvpair id="cib-bootstrap-options-no-quorum-policy" >> name="no-quorum-policy" value="stop"/> >> <nvpair id="cib-bootstrap-options-default-resource-stickiness" >> name="default-resource-stickiness" value="0"/> >> <nvpair >> id="cib-bootstrap-options-default-resource-failure-stickiness" >> name="default-resource-failure-stickiness" value="0"/> >> <nvpair id="cib-bootstrap-options-stonith-enabled" >> name="stonith-enabled" value="false"/> >> <nvpair id="cib-bootstrap-options-stonith-action" >> name="stonith-action" value="reboot"/> >> <nvpair id="cib-bootstrap-options-startup-fencing" >> name="startup-fencing" value="true"/> >> <nvpair id="cib-bootstrap-options-stop-orphan-resources" >> name="stop-orphan-resources" value="true"/> >> <nvpair id="cib-bootstrap-options-stop-orphan-actions" >> name="stop-orphan-actions" value="true"/> >> <nvpair id="cib-bootstrap-options-remove-after-stop" >> name="remove-after-stop" value="false"/> >> <nvpair id="cib-bootstrap-options-short-resource-names" >> name="short-resource-names" value="true"/> >> <nvpair id="cib-bootstrap-options-transition-idle-timeout" >> name="transition-idle-timeout" value="5min"/> >> <nvpair id="cib-bootstrap-options-default-action-timeout" >> name="default-action-timeout" value="20s"/> >> <nvpair id="cib-bootstrap-options-is-managed-default" >> name="is-managed-default" value="true"/> >> <nvpair id="cib-bootstrap-options-cluster-delay" >> name="cluster-delay" value="60s"/> >> <nvpair id="cib-bootstrap-options-pe-error-series-max" >> name="pe-error-series-max" value="-1"/> >> <nvpair id="cib-bootstrap-options-pe-warn-series-max" >> name="pe-warn-series-max" value="-1"/> >> <nvpair id="cib-bootstrap-options-pe-input-series-max" >> name="pe-input-series-max" value="-1"/> >> <nvpair id="cib-bootstrap-options-dc-version" >> name="dc-version" value="1.0.6-17fe0022afda074a937d934b3eb625eccd1f01ef"/> >> <nvpair id="cib-bootstrap-options-cluster-infrastructure" >> name="cluster-infrastructure" value="Heartbeat"/> >> </attributes> >> </cluster_property_set> >> </crm_config> >> <nodes> >> <node id="db80324b-c9de-4995-a66a-eedf93abb42c" >> uname="dbsuat1a.intranet.mydomain.com" type="normal"/> >> <node id="e99889ee-da15-4b09-bfc7-641e3ac0687f" >> uname="dbsuat1b.intranet.mydomain.com" type="normal"/> >> </nodes> >> <resources> >> <primitive class="ocf" id="IPaddr_172_28_185_49" >> provider="heartbeat" type="IPaddr"> >> <operations> >> <op id="IPaddr_172_28_185_49_mon" interval="5s" name="monitor" >> timeout="5s"/> >> </operations> >> <instance_attributes id="IPaddr_172_28_185_49_inst_attr"> >> <attributes> >> <nvpair id="IPaddr_172_28_185_49_attr_0" name="ip" >> value="172.28.185.49"/> >> </attributes> >> </instance_attributes> >> </primitive> >> </resources> >> <constraints> >> <rsc_location id="rsc_location_IPaddr_172_28_185_49" >> rsc="IPaddr_172_28_185_49"> >> <rule id="prefered_location_IPaddr_172_28_185_49" score="100"> >> <expression attribute="#uname" >> id="prefered_location_IPaddr_172_28_185_49_expr" operation="eq" >> value="DBSUAT1A.intranet.mydomain.com"/> >> </rule> >> </rsc_location> >> </constraints> >> </configuration> >> </cib> >> >> Marian Marinov wrote: >> >>> Can you please give us your crm configuration ? >>> >>> Marian >>> >>> On Sunday 21 March 2010 23:30:46 mike wrote: >>> >>>> Thank you Marian. I removed th efile as you suggested but unfortunately >>>> it has made no difference. The ip address is simply not being released >>>> when I stop the heartbeat process. >>>> >>>> Anyone have an ideas where I could start to look at this? The only way I >>>> can get the ip address released is to reboot the node. >>>> >>>> thanks >>>> >>>> Marian Marinov wrote: >>>> >>>>> On Saturday 20 March 2010 03:56:27 mike wrote: >>>>> >>>>>> Hi guys, >>>>>> >>>>>> I have a simple 2 node cluster with a VIP running on RHEL 5.3 on s390. >>>>>> Nothing else configured yet. >>>>>> >>>>>> When I start up the cluster, all is well. The VIP starts up on the >>>>>> home node and crm_mon shows the resource and nodes as on line. No >>>>>> errors in the logs. >>>>>> >>>>>> If I issue service heartbeat stop on the main node, the ip fails over >>>>>> to the back up node and crm_mon shows as I would expect it should, >>>>>> i.e. the ip address is on the back up node and that the other node is >>>>>> offline. However, if I do a ifconfig on the main node I see that the >>>>>> eth0:0 entry is still there so in effect the vip address is now on >>>>>> both servers. >>>>>> >>>>>> If both nodes were up and running and I rebooted the main node then >>>>>> the failover works perfectly. >>>>>> >>>>>> Would anyone know why the nodes seem unable to release the vip unless >>>>>> rebooted? >>>>>> >>>>>> ha-log: >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for shutdown >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> notice: LogActions: Move resource IPaddr_172_28_185_49 (Started >>>>>> dbsuat1a.intranet.mydomain.com -> dbsuat1b.intranet.mydomain.com) >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_state_transition: State transition S_POLICY_ENGINE -> >>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE >>>>>> origin=handle_response ] >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> process_pe_message: Transition 5: PEngine Input stored in: >>>>>> /usr/var/lib/pengine/pe-input-337.bz2 >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> process_pe_message: Configuration WARNINGs found during PE processing. >>>>>> Please run "crm_verify -L" to identify issues. >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> unpack_graph: Unpacked transition 5: 5 actions in 5 synapses >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_te_invoke: Processing graph 5 (ref=pe_calc-dc-1269021312-26) >>>>>> derived from /usr/var/lib/pengine/pe-input-337.bz2 >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> te_rsc_command: Initiating action 6: stop IPaddr_172_28_185_49_stop_0 >>>>>> on dbsuat1a.intranet.mydomain.com (local) >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_lrm_rsc_op: Performing >>>>>> key=6:5:0:888fa84e-3267-409e-966b-2ab01e579c0f >>>>>> op=IPaddr_172_28_185_49_stop_0 ) >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: info: >>>>>> rsc:IPaddr_172_28_185_49:5: stop >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> process_lrm_event: LRM operation IPaddr_172_28_185_49_monitor_5000 >>>>>> (call=4, status=1, cib-update=0, confirmed=true) Cancelled >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: >>>>>> IPaddr_172_28_185_49:stop process (PID 5474) timed out (try 1). >>>>>> Killing with signal SIGTERM (15). >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: >>>>>> Managed IPaddr_172_28_185_49:stop process 5474 killed by signal 15 >>>>>> [SIGTERM - Termination (ANSI)]. >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: >>>>>> operation stop[5] on ocf::IPaddr::IPaddr_172_28_185_49 for client >>>>>> 4531, its parameters: ip=[172.28.185.49] CRM_meta_timeout=[20000] >>>>>> crm_feature_set=[3.0.1] : pid [5474] timed out >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com crmd: [4531]: ERROR: >>>>>> process_lrm_event: LRM operation IPaddr_172_28_185_49_stop_0 (5) Timed >>>>>> Out (timeout=20000ms) >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: >>>>>> status_from_rc: Action 6 (IPaddr_172_28_185_49_stop_0) on >>>>>> dbsuat1a.intranet.mydomain.com failed (target: 0 vs. rc: -2): Error >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: >>>>>> update_failcount: Updating failcount for IPaddr_172_28_185_49 on >>>>>> dbsuat1a.intranet.mydomain.com after failed stop: rc=-2 >>>>>> (update=INFINITY, time=1269021333) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> abort_transition_graph: match_graph_event:272 - Triggered transition >>>>>> abort (complete=0, tag=lrm_rsc_op, id=IPaddr_172_28_185_49_stop_0, >>>>>> magic=2:-2;6:5:0:888fa84e-3267-409e-966b-2ab01e579c0f, cib=0.23.16) : >>>>>> Event failed >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> update_abort_priority: Abort priority upgraded from 0 to 1 >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> update_abort_priority: Abort action done superceeded by restart >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> match_graph_event: Action IPaddr_172_28_185_49_stop_0 (6) confirmed on >>>>>> dbsuat1a.intranet.mydomain.com (rc=4) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> run_graph: ==================================================== >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: >>>>>> run_graph: Transition 5 (Complete=1, Pending=0, Fired=0, Skipped=4, >>>>>> Incomplete=0, Source=/usr/var/lib/pengine/pe-input-337.bz2): Stopped >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> te_graph_trigger: Transition 5 is now complete >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>> find_hash_entry: Creating hash entry for >>>>>> fail-count-IPaddr_172_28_185_49 Mar 19 13:55:33 >>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_state_transition: State transition S_TRANSITION_ENGINE -> >>>>>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL >>>>>> origin=notify_crmd ] Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com >>>>>> crmd: [4531]: info: do_state_transition: All 2 cluster nodes are >>>>>> eligible to run resources. Mar 19 13:55:33 >>>>>> DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>> attrd_trigger_update: Sending flush op to all hosts for: >>>>>> fail-count-IPaddr_172_28_185_49 (INFINITY) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>> attrd_perform_update: Sent update 24: >>>>>> fail-count-IPaddr_172_28_185_49=INFINITY >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_pe_invoke: Query 53: Requesting the current CIB: S_POLICY_ENGINE >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>> find_hash_entry: Creating hash entry for >>>>>> last-failure-IPaddr_172_28_185_49 Mar 19 13:55:33 >>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> abort_transition_graph: te_update_diff:146 - Triggered transition >>>>>> abort (complete=1, tag=transient_attributes, >>>>>> id=db80324b-c9de-4995-a66a-eedf93abb42c, magic=NA, cib=0.23.17) : >>>>>> Transient attribute: update >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>> attrd_trigger_update: Sending flush op to all hosts for: >>>>>> last-failure-IPaddr_172_28_185_49 (1269021333) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>> attrd_perform_update: Sent update 27: >>>>>> last-failure-IPaddr_172_28_185_49=1269021333 >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1269021333-28, >>>>>> seq=2, quorate=1 >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> abort_transition_graph: te_update_diff:146 - Triggered transition >>>>>> abort (complete=1, tag=transient_attributes, >>>>>> id=db80324b-c9de-4995-a66a-eedf93abb42c, magic=NA, cib=0.23.18) : >>>>>> Transient attribute: update >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> notice: update_validation: Upgrading transitional-0.6-style >>>>>> configuration to pacemaker-1.0 with /usr/share/pacemaker/upgrade06.xsl >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> update_validation: Transformation /usr/share/pacemaker/upgrade06.xsl >>>>>> successful >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> notice: update_validation: Upgraded from transitional-0.6 to >>>>>> pacemaker-1.0 validation >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: >>>>>> cli_config_update: Your configuration was internally updated to the >>>>>> latest version (pacemaker-1.0) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_pe_invoke: Query 54: Requesting the current CIB: S_POLICY_ENGINE >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_pe_invoke: Query 55: Requesting the current CIB: S_POLICY_ENGINE >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_pe_invoke_callback: Invoking the PE: ref=pe_calc-dc-1269021333-29, >>>>>> seq=2, quorate=1 >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = >>>>>> 0 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> info: determine_online_status: Node dbsuat1a.intranet.mydomain.com is >>>>>> shutting down >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: >>>>>> unpack_rsc_op: Processing failed op IPaddr_172_28_185_49_stop_0 on >>>>>> dbsuat1a.intranet.mydomain.com: unknown exec error (-2) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> native_add_running: resource IPaddr_172_28_185_49 isnt managed >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> determine_online_status: Node dbsuat1b.intranet.mydomain.com is online >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> notice: native_print: IPaddr_172_28_185_49 (ocf::heartbeat:IPaddr): >>>>>> Started dbsuat1a.intranet.mydomain.com (unmanaged) FAILED >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> get_failcount: IPaddr_172_28_185_49 has failed 1000000 times on >>>>>> dbsuat1a.intranet.mydomain.com >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: >>>>>> common_apply_stickiness: Forcing IPaddr_172_28_185_49 away from >>>>>> dbsuat1a.intranet.mydomain.com after 1000000 failures (max=1000000) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> native_color: Unmanaged resource IPaddr_172_28_185_49 allocated to >>>>>> 'nowhere': failed >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for shutdown >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> notice: LogActions: Leave resource IPaddr_172_28_185_49 (Started >>>>>> unmanaged) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: >>>>>> [4531]: info: handle_response: pe_calc calculation >>>>>> pe_calc-dc-1269021333-28 is obsolete Mar 19 13:55:33 >>>>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> process_pe_message: Transition 6: PEngine Input stored in: >>>>>> /usr/var/lib/pengine/pe-input-338.bz2 >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> process_pe_message: Configuration WARNINGs found during PE processing. >>>>>> Please run "crm_verify -L" to identify issues. >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> notice: update_validation: Upgrading transitional-0.6-style >>>>>> configuration to pacemaker-1.0 with /usr/share/pacemaker/upgrade06.xsl >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> update_validation: Transformation /usr/share/pacemaker/upgrade06.xsl >>>>>> successful >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> notice: update_validation: Upgraded from transitional-0.6 to >>>>>> pacemaker-1.0 validation >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: >>>>>> cli_config_update: Your configuration was internally updated to the >>>>>> latest version (pacemaker-1.0) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, 'green' = >>>>>> 0 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> info: determine_online_status: Node dbsuat1a.intranet.mydomain.com is >>>>>> shutting down >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: >>>>>> unpack_rsc_op: Processing failed op IPaddr_172_28_185_49_stop_0 on >>>>>> dbsuat1a.intranet.mydomain.com: unknown exec error (-2) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> native_add_running: resource IPaddr_172_28_185_49 isnt managed >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> determine_online_status: Node dbsuat1b.intranet.mydomain.com is online >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> notice: native_print: IPaddr_172_28_185_49 (ocf::heartbeat:IPaddr): >>>>>> Started dbsuat1a.intranet.mydomain.com (unmanaged) FAILED >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> get_failcount: IPaddr_172_28_185_49 has failed 1000000 times on >>>>>> dbsuat1a.intranet.mydomain.com >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: WARN: >>>>>> common_apply_stickiness: Forcing IPaddr_172_28_185_49 away from >>>>>> dbsuat1a.intranet.mydomain.com after 1000000 failures (max=1000000) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> native_color: Unmanaged resource IPaddr_172_28_185_49 allocated to >>>>>> 'nowhere': failed >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for shutdown >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>> notice: LogActions: Leave resource IPaddr_172_28_185_49 (Started >>>>>> unmanaged) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: >>>>>> [4714]: info: process_pe_message: Transition 7: PEngine Input stored >>>>>> in: >>>>>> /usr/var/lib/pengine/pe-input-339.bz2 >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_state_transition: State transition S_POLICY_ENGINE -> >>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE >>>>>> origin=handle_response ] >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> process_pe_message: Configuration WARNINGs found during PE processing. >>>>>> Please run "crm_verify -L" to identify issues. >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> unpack_graph: Unpacked transition 7: 1 actions in 1 synapses >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_te_invoke: Processing graph 7 (ref=pe_calc-dc-1269021333-29) >>>>>> derived from /usr/var/lib/pengine/pe-input-339.bz2 >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> te_crm_command: Executing crm-event (10): do_shutdown on >>>>>> dbsuat1a.intranet.mydomain.com >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> te_crm_command: crm-event (10) is a local shutdown >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> run_graph: ==================================================== >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: >>>>>> run_graph: Transition 7 (Complete=1, Pending=0, Fired=0, Skipped=0, >>>>>> Incomplete=0, Source=/usr/var/lib/pengine/pe-input-339.bz2): Complete >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> te_graph_trigger: Transition 7 is now complete >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_state_transition: State transition S_TRANSITION_ENGINE -> >>>>>> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ] >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_dc_release: DC role released >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> stop_subsystem: Sent -TERM to pengine: [4714] >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_te_control: Transitioner is now inactive >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>> crm_signal_dispatch: Invoking handler for signal 15: Terminated >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_te_control: Disconnecting STONITH... >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> tengine_stonith_connection_destroy: Fencing daemon disconnected >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: >>>>>> Not currently connected. >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_shutdown: Terminating the pengine >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> stop_subsystem: Sent -TERM to pengine: [4714] >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_shutdown: Waiting for subsystems to exit >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: >>>>>> register_fsa_input_adv: do_shutdown stalled the FSA with pending >>>>>> inputs Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: >>>>>> info: do_shutdown: All subsystems stopped, continuing >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: >>>>>> do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received in >>>>>> state S_STOPPING >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_shutdown: Terminating the pengine >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> stop_subsystem: Sent -TERM to pengine: [4714] >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_shutdown: Waiting for subsystems to exit >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_shutdown: All subsystems stopped, continuing >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> crmdManagedChildDied: Process pengine:[4714] exited (signal=0, >>>>>> exitcode=0) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: >>>>>> [4531]: info: pe_msg_dispatch: Received HUP from pengine:[4714] >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> pe_connection_destroy: Connection to the Policy Engine released >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_shutdown: All subsystems stopped, continuing >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: ERROR: >>>>>> verify_stopped: Resource IPaddr_172_28_185_49 was active at shutdown. >>>>>> You may ignore this error if it is unmanaged. >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_lrm_control: Disconnected from the LRM >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info: >>>>>> client (pid=4531) removed from ccm >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_ha_control: Disconnected from Heartbeat >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_cib_control: Disconnecting CIB >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>> cib_process_readwrite: We are now in R/O mode >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> crmd_cib_connection_destroy: Connection to the CIB terminated... >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> free_mem: Dropping I_TERMINATE: [ state=S_STOPPING >>>>>> cause=C_FSA_INTERNAL origin=do_stop ] >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>> do_exit: [crmd] stopped (0) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: killing /usr/lib64/heartbeat/attrd process group 4530 with >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: >>>>>> [4530]: info: crm_signal_dispatch: Invoking handler for signal 15: >>>>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: >>>>>> [4530]: info: attrd_shutdown: Exiting >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>> main: Exiting... >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>> attrd_cib_connection_destroy: Connection to the CIB terminated... >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: killing /usr/lib64/heartbeat/stonithd process group 4529 with >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com stonithd: >>>>>> [4529]: notice: /usr/lib64/heartbeat/stonithd normally quit. >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: killing /usr/lib64/heartbeat/lrmd -r process group 4528 with >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: >>>>>> info: lrmd is shutting down >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: >>>>>> resource IPaddr_172_28_185_49 is left in UNKNOWN status.(last op stop >>>>>> finished without LRM_OP_DONE status.) >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: killing /usr/lib64/heartbeat/cib process group 4527 with signal >>>>>> 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>> crm_signal_dispatch: Invoking handler for signal 15: Terminated Mar 19 >>>>>> 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>> cib_shutdown: Disconnected 0 clients >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>> cib_process_disconnect: All clients disconnected... >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>> initiate_exit: Sending disconnect notification to 2 peers... >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>> cib_process_shutdown_req: Shutdown ACK from >>>>>> dbsuat1b.intranet.mydomain.com Mar 19 13:55:34 >>>>>> DBSUAT1A.intranet.mydomain.com cib: [4527]: info: terminate_cib: >>>>>> cib_process_shutdown_req: Disconnecting heartbeat Mar 19 13:55:34 >>>>>> DBSUAT1A.intranet.mydomain.com cib: [4527]: info: terminate_cib: >>>>>> Exiting... >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>> cib_process_request: Operation complete: op cib_shutdown_req for >>>>>> section 'all' >>>>>> (origin=dbsuat1b.intranet.mydomain.com/dbsuat1b.intranet.mydomain.com/ >>>>>> (n ull ), version=0.0.0): ok (rc=0) >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>> ha_msg_dispatch: Lost connection to heartbeat service. >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>> main: Done >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info: >>>>>> client (pid=4527) removed from ccm >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: killing /usr/lib64/heartbeat/ccm process group 4526 with signal >>>>>> 15 Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info: >>>>>> received SIGTERM, going to shut down >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: killing HBFIFO process 4522 with signal 15 >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: killing HBWRITE process 4523 with signal 15 >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: killing HBREAD process 4524 with signal 15 >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: Core process 4524 exited. 3 remaining >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: Core process 4523 exited. 2 remaining >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: Core process 4522 exited. 1 remaining >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>> info: dbsuat1a.intranet.mydomain.com Heartbeat shutdown complete. >>>>>> >>>>>> >>>>>> my ha.cf >>>>>> # Logging >>>>>> debug 1 >>>>>> debugfile /var/log/ha-debug >>>>>> logfile /var/log/ha-log >>>>>> logfacility local0 >>>>>> #use_logd true >>>>>> #logfacility daemon >>>>>> >>>>>> # Misc Options >>>>>> traditional_compression off >>>>>> compression bz2 >>>>>> coredumps true >>>>>> >>>>>> # Communications >>>>>> udpport 691 >>>>>> bcast eth0 >>>>>> ##autojoin any >>>>>> autojoin none >>>>>> >>>>>> # Thresholds (in seconds) >>>>>> keepalive 1 >>>>>> warntime 6 >>>>>> deadtime 10 >>>>>> initdead 15 >>>>>> >>>>>> node dbsuat1a.intranet.mydomain.com >>>>>> node dbsuat1b.intranet.mydomain.com >>>>>> #enable pacemaker >>>>>> crm yes >>>>>> #enable STONITH >>>>>> #crm respawn >>>>>> >>>>>> my haresources: >>>>>> DBSUAT1A.intranet.mydomain.com 172.28.185.49 >>>>>> >>>>> I don't have a very good advice, but you shouldn't use haresources >>>>> anymore. You should use pacemaker for configuring the cluster. >>>>> >>>>> You have said that you wish to use pacemaker(crm) with this line of >>>>> your config: crm yes >>>>> >>>>> Remove the haresources file, restart the heartbeat on both nodes and >>>>> redo the tests. >>>>> >>>>> >>>>> ----------------------------------------------------------------------- >>>>> - >>>>> >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> [email protected] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> [email protected] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> >> > > > ------------------------------------------------------------------------ > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
