I guess all I wanted to do was generate a cib.xml file from it and then go from there. Now that it is working, I'm done with the haresources file and have removed it. From what I read in your links there IPAddr is "linux specific". Guess I'll be sure to use that from now on.
Thank you for your help Marian Marinov wrote: > I really can't understand why you are not using the CRM shell and continue > with the haresources :) > > To see the IP you can use this shortcut command: ip a l > or ip a l eth0 > > There is a huge difference between IPaddr and IPaddr2. > > Please look here: > http://www.linux-ha.org/doc/re-ra-IPaddr.html > http://www.linux-ha.org/doc/re-ra-IPaddr2.html > > You can also look here for some crm examples: > http://clusterlabs.org/wiki/Example_configurations > > Regards, > Marian > > On Monday 22 March 2010 23:03:44 mike wrote: > >> Hi Marian, >> >> Success!!!! >> I changed my haresources file to look like this: >> DBSUAT1A.intranet.mydomain.com IPaddr2::172.28.185.49 >> (so I'm using IPaddr2 now and not IPaddr) >> >> I removed the old cib.xml and cib.xml.sig. Generated a new cib.xml with >> /usr/lib64/heartbeat/haresources2cib.py. >> >> Removed the haresources and enabled crm again in ha.cf and the cluster >> works flawlessly. It fails back and forth as it should. The only issue >> is that ifconfig does not reveal the ip address but instead I have to >> use: ip addr show eth0 >> >> Now the question is of course - why does IPaddr2 work where IPaddr >> doesn't? That's what I need to figure out..... >> >> Marian Marinov wrote: >> >>> Hello mike, >>> >>> Your problem is pretty simple. You simply don't have configured IPaddr >>> resource within pacemaker. >>> >>> Please look here for IPaddr: >>> http://www.linux-ha.org/doc/re-ra-IPaddr.html >>> And here for IPaddr2: >>> http://www.linux-ha.org/doc/re-ra-IPaddr2.html >>> >>> You should remove the haresources file and configure pacemaker so it can >>> handle your IP address. >>> >>> You can find basic configurations here: >>> http://clusterlabs.org/wiki/Example_configurations >>> http://clusterlabs.org/wiki/Example_XML_configurations >>> >>> Best regards, >>> Marian >>> >>> On Monday 22 March 2010 01:25:58 mike wrote: >>> >>>> Hi Marian - I have included my cib.xml file below. >>>> What I have found tonight is that by commenting out the crm entry in the >>>> ha.cf and re-enabling the haresources file, I am able to fail the ip >>>> back and forth at will. Here is what my haresources file looks like: >>>> *DBSUAT1A.intranet.mydomain.com IPaddr::172.28.185.49* >>>> >>>> My cib.xml file which was generated from the above haresources file >>>> using /usr/lib64/heartbeat/haresources2cib.py >>>> [r...@dbsuat1b support]# cat cib.xml >>>> <cib admin_epoch="0" epoch="7" validate-with="transitional-0.6" >>>> crm_feature_set="3.0.1" have-quorum="1" >>>> dc-uuid="e99889ee-da15-4b09-bfc7-641e3ac0687f" num_updates="0" >>>> cib-last-written="Sun Mar 21 19:10:03 2010"> >>>> <configuration> >>>> <crm_config> >>>> <cluster_property_set id="cib-bootstrap-options"> >>>> <attributes> >>>> <nvpair id="cib-bootstrap-options-symmetric-cluster" >>>> name="symmetric-cluster" value="true"/> >>>> <nvpair id="cib-bootstrap-options-no-quorum-policy" >>>> name="no-quorum-policy" value="stop"/> >>>> <nvpair id="cib-bootstrap-options-default-resource-stickiness" >>>> name="default-resource-stickiness" value="0"/> >>>> <nvpair >>>> id="cib-bootstrap-options-default-resource-failure-stickiness" >>>> name="default-resource-failure-stickiness" value="0"/> >>>> <nvpair id="cib-bootstrap-options-stonith-enabled" >>>> name="stonith-enabled" value="false"/> >>>> <nvpair id="cib-bootstrap-options-stonith-action" >>>> name="stonith-action" value="reboot"/> >>>> <nvpair id="cib-bootstrap-options-startup-fencing" >>>> name="startup-fencing" value="true"/> >>>> <nvpair id="cib-bootstrap-options-stop-orphan-resources" >>>> name="stop-orphan-resources" value="true"/> >>>> <nvpair id="cib-bootstrap-options-stop-orphan-actions" >>>> name="stop-orphan-actions" value="true"/> >>>> <nvpair id="cib-bootstrap-options-remove-after-stop" >>>> name="remove-after-stop" value="false"/> >>>> <nvpair id="cib-bootstrap-options-short-resource-names" >>>> name="short-resource-names" value="true"/> >>>> <nvpair id="cib-bootstrap-options-transition-idle-timeout" >>>> name="transition-idle-timeout" value="5min"/> >>>> <nvpair id="cib-bootstrap-options-default-action-timeout" >>>> name="default-action-timeout" value="20s"/> >>>> <nvpair id="cib-bootstrap-options-is-managed-default" >>>> name="is-managed-default" value="true"/> >>>> <nvpair id="cib-bootstrap-options-cluster-delay" >>>> name="cluster-delay" value="60s"/> >>>> <nvpair id="cib-bootstrap-options-pe-error-series-max" >>>> name="pe-error-series-max" value="-1"/> >>>> <nvpair id="cib-bootstrap-options-pe-warn-series-max" >>>> name="pe-warn-series-max" value="-1"/> >>>> <nvpair id="cib-bootstrap-options-pe-input-series-max" >>>> name="pe-input-series-max" value="-1"/> >>>> <nvpair id="cib-bootstrap-options-dc-version" >>>> name="dc-version" >>>> value="1.0.6-17fe0022afda074a937d934b3eb625eccd1f01ef"/> <nvpair >>>> id="cib-bootstrap-options-cluster-infrastructure" >>>> name="cluster-infrastructure" value="Heartbeat"/> >>>> </attributes> >>>> </cluster_property_set> >>>> </crm_config> >>>> <nodes> >>>> <node id="db80324b-c9de-4995-a66a-eedf93abb42c" >>>> uname="dbsuat1a.intranet.mydomain.com" type="normal"/> >>>> <node id="e99889ee-da15-4b09-bfc7-641e3ac0687f" >>>> uname="dbsuat1b.intranet.mydomain.com" type="normal"/> >>>> </nodes> >>>> <resources> >>>> <primitive class="ocf" id="IPaddr_172_28_185_49" >>>> provider="heartbeat" type="IPaddr"> >>>> <operations> >>>> <op id="IPaddr_172_28_185_49_mon" interval="5s" name="monitor" >>>> timeout="5s"/> >>>> </operations> >>>> <instance_attributes id="IPaddr_172_28_185_49_inst_attr"> >>>> <attributes> >>>> <nvpair id="IPaddr_172_28_185_49_attr_0" name="ip" >>>> value="172.28.185.49"/> >>>> </attributes> >>>> </instance_attributes> >>>> </primitive> >>>> </resources> >>>> <constraints> >>>> <rsc_location id="rsc_location_IPaddr_172_28_185_49" >>>> rsc="IPaddr_172_28_185_49"> >>>> <rule id="prefered_location_IPaddr_172_28_185_49" score="100"> >>>> <expression attribute="#uname" >>>> id="prefered_location_IPaddr_172_28_185_49_expr" operation="eq" >>>> value="DBSUAT1A.intranet.mydomain.com"/> >>>> </rule> >>>> </rsc_location> >>>> </constraints> >>>> </configuration> >>>> </cib> >>>> >>>> Marian Marinov wrote: >>>> >>>>> Can you please give us your crm configuration ? >>>>> >>>>> Marian >>>>> >>>>> On Sunday 21 March 2010 23:30:46 mike wrote: >>>>> >>>>>> Thank you Marian. I removed th efile as you suggested but >>>>>> unfortunately it has made no difference. The ip address is simply not >>>>>> being released when I stop the heartbeat process. >>>>>> >>>>>> Anyone have an ideas where I could start to look at this? The only way >>>>>> I can get the ip address released is to reboot the node. >>>>>> >>>>>> thanks >>>>>> >>>>>> Marian Marinov wrote: >>>>>> >>>>>>> On Saturday 20 March 2010 03:56:27 mike wrote: >>>>>>> >>>>>>>> Hi guys, >>>>>>>> >>>>>>>> I have a simple 2 node cluster with a VIP running on RHEL 5.3 on >>>>>>>> s390. Nothing else configured yet. >>>>>>>> >>>>>>>> When I start up the cluster, all is well. The VIP starts up on the >>>>>>>> home node and crm_mon shows the resource and nodes as on line. No >>>>>>>> errors in the logs. >>>>>>>> >>>>>>>> If I issue service heartbeat stop on the main node, the ip fails >>>>>>>> over to the back up node and crm_mon shows as I would expect it >>>>>>>> should, i.e. the ip address is on the back up node and that the >>>>>>>> other node is offline. However, if I do a ifconfig on the main node >>>>>>>> I see that the eth0:0 entry is still there so in effect the vip >>>>>>>> address is now on both servers. >>>>>>>> >>>>>>>> If both nodes were up and running and I rebooted the main node then >>>>>>>> the failover works perfectly. >>>>>>>> >>>>>>>> Would anyone know why the nodes seem unable to release the vip >>>>>>>> unless rebooted? >>>>>>>> >>>>>>>> ha-log: >>>>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for >>>>>>>> shutdown Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: >>>>>>>> [4714]: notice: LogActions: Move resource IPaddr_172_28_185_49 >>>>>>>> (Started dbsuat1a.intranet.mydomain.com -> >>>>>>>> dbsuat1b.intranet.mydomain.com) Mar 19 13:55:12 >>>>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_state_transition: State transition S_POLICY_ENGINE -> >>>>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE >>>>>>>> origin=handle_response ] >>>>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: process_pe_message: Transition 5: PEngine Input stored in: >>>>>>>> /usr/var/lib/pengine/pe-input-337.bz2 >>>>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: process_pe_message: Configuration WARNINGs found during PE >>>>>>>> processing. Please run "crm_verify -L" to identify issues. >>>>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> unpack_graph: Unpacked transition 5: 5 actions in 5 synapses >>>>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_te_invoke: Processing graph 5 (ref=pe_calc-dc-1269021312-26) >>>>>>>> derived from /usr/var/lib/pengine/pe-input-337.bz2 >>>>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> te_rsc_command: Initiating action 6: stop >>>>>>>> IPaddr_172_28_185_49_stop_0 on dbsuat1a.intranet.mydomain.com >>>>>>>> (local) >>>>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_lrm_rsc_op: Performing >>>>>>>> key=6:5:0:888fa84e-3267-409e-966b-2ab01e579c0f >>>>>>>> op=IPaddr_172_28_185_49_stop_0 ) >>>>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: info: >>>>>>>> rsc:IPaddr_172_28_185_49:5: stop >>>>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> process_lrm_event: LRM operation IPaddr_172_28_185_49_monitor_5000 >>>>>>>> (call=4, status=1, cib-update=0, confirmed=true) Cancelled >>>>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: >>>>>>>> IPaddr_172_28_185_49:stop process (PID 5474) timed out (try 1). >>>>>>>> Killing with signal SIGTERM (15). >>>>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: >>>>>>>> Managed IPaddr_172_28_185_49:stop process 5474 killed by signal 15 >>>>>>>> [SIGTERM - Termination (ANSI)]. >>>>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: >>>>>>>> operation stop[5] on ocf::IPaddr::IPaddr_172_28_185_49 for client >>>>>>>> 4531, its parameters: ip=[172.28.185.49] CRM_meta_timeout=[20000] >>>>>>>> crm_feature_set=[3.0.1] : pid [5474] timed out >>>>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com crmd: [4531]: ERROR: >>>>>>>> process_lrm_event: LRM operation IPaddr_172_28_185_49_stop_0 (5) >>>>>>>> Timed Out (timeout=20000ms) >>>>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: >>>>>>>> status_from_rc: Action 6 (IPaddr_172_28_185_49_stop_0) on >>>>>>>> dbsuat1a.intranet.mydomain.com failed (target: 0 vs. rc: -2): Error >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: >>>>>>>> update_failcount: Updating failcount for IPaddr_172_28_185_49 on >>>>>>>> dbsuat1a.intranet.mydomain.com after failed stop: rc=-2 >>>>>>>> (update=INFINITY, time=1269021333) >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> abort_transition_graph: match_graph_event:272 - Triggered transition >>>>>>>> abort (complete=0, tag=lrm_rsc_op, id=IPaddr_172_28_185_49_stop_0, >>>>>>>> magic=2:-2;6:5:0:888fa84e-3267-409e-966b-2ab01e579c0f, cib=0.23.16) >>>>>>>> : Event failed >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> update_abort_priority: Abort priority upgraded from 0 to 1 >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> update_abort_priority: Abort action done superceeded by restart >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> match_graph_event: Action IPaddr_172_28_185_49_stop_0 (6) confirmed >>>>>>>> on dbsuat1a.intranet.mydomain.com (rc=4) >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> run_graph: ==================================================== >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: >>>>>>>> run_graph: Transition 5 (Complete=1, Pending=0, Fired=0, Skipped=4, >>>>>>>> Incomplete=0, Source=/usr/var/lib/pengine/pe-input-337.bz2): Stopped >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> te_graph_trigger: Transition 5 is now complete >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>>>> find_hash_entry: Creating hash entry for >>>>>>>> fail-count-IPaddr_172_28_185_49 Mar 19 13:55:33 >>>>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_state_transition: State transition S_TRANSITION_ENGINE -> >>>>>>>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL >>>>>>>> origin=notify_crmd ] Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com >>>>>>>> crmd: [4531]: info: do_state_transition: All 2 cluster nodes are >>>>>>>> eligible to run resources. Mar 19 13:55:33 >>>>>>>> DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>>>> attrd_trigger_update: Sending flush op to all hosts for: >>>>>>>> fail-count-IPaddr_172_28_185_49 (INFINITY) >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>>>> attrd_perform_update: Sent update 24: >>>>>>>> fail-count-IPaddr_172_28_185_49=INFINITY >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_pe_invoke: Query 53: Requesting the current CIB: S_POLICY_ENGINE >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>>>> find_hash_entry: Creating hash entry for >>>>>>>> last-failure-IPaddr_172_28_185_49 Mar 19 13:55:33 >>>>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> abort_transition_graph: te_update_diff:146 - Triggered transition >>>>>>>> abort (complete=1, tag=transient_attributes, >>>>>>>> id=db80324b-c9de-4995-a66a-eedf93abb42c, magic=NA, cib=0.23.17) : >>>>>>>> Transient attribute: update >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>>>> attrd_trigger_update: Sending flush op to all hosts for: >>>>>>>> last-failure-IPaddr_172_28_185_49 (1269021333) >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>>>> attrd_perform_update: Sent update 27: >>>>>>>> last-failure-IPaddr_172_28_185_49=1269021333 >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_pe_invoke_callback: Invoking the PE: >>>>>>>> ref=pe_calc-dc-1269021333-28, seq=2, quorate=1 >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> abort_transition_graph: te_update_diff:146 - Triggered transition >>>>>>>> abort (complete=1, tag=transient_attributes, >>>>>>>> id=db80324b-c9de-4995-a66a-eedf93abb42c, magic=NA, cib=0.23.18) : >>>>>>>> Transient attribute: update >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> notice: update_validation: Upgrading transitional-0.6-style >>>>>>>> configuration to pacemaker-1.0 with >>>>>>>> /usr/share/pacemaker/upgrade06.xsl Mar 19 13:55:33 >>>>>>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>>>> update_validation: Transformation /usr/share/pacemaker/upgrade06.xsl >>>>>>>> successful >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> notice: update_validation: Upgraded from transitional-0.6 to >>>>>>>> pacemaker-1.0 validation >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> WARN: cli_config_update: Your configuration was internally updated >>>>>>>> to the latest version (pacemaker-1.0) >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_pe_invoke: Query 54: Requesting the current CIB: S_POLICY_ENGINE >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_pe_invoke: Query 55: Requesting the current CIB: S_POLICY_ENGINE >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_pe_invoke_callback: Invoking the PE: >>>>>>>> ref=pe_calc-dc-1269021333-29, seq=2, quorate=1 >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, >>>>>>>> 'green' = 0 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: >>>>>>>> [4714]: info: determine_online_status: Node >>>>>>>> dbsuat1a.intranet.mydomain.com is shutting down >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> WARN: unpack_rsc_op: Processing failed op >>>>>>>> IPaddr_172_28_185_49_stop_0 on dbsuat1a.intranet.mydomain.com: >>>>>>>> unknown exec error (-2) >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: native_add_running: resource IPaddr_172_28_185_49 isnt managed >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: determine_online_status: Node dbsuat1b.intranet.mydomain.com >>>>>>>> is online Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: >>>>>>>> [4714]: notice: native_print: IPaddr_172_28_185_49 >>>>>>>> (ocf::heartbeat:IPaddr): Started dbsuat1a.intranet.mydomain.com >>>>>>>> (unmanaged) FAILED >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: get_failcount: IPaddr_172_28_185_49 has failed 1000000 times >>>>>>>> on dbsuat1a.intranet.mydomain.com >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> WARN: common_apply_stickiness: Forcing IPaddr_172_28_185_49 away >>>>>>>> from dbsuat1a.intranet.mydomain.com after 1000000 failures >>>>>>>> (max=1000000) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com >>>>>>>> pengine: [4714]: info: native_color: Unmanaged resource >>>>>>>> IPaddr_172_28_185_49 allocated to 'nowhere': failed >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for >>>>>>>> shutdown Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: >>>>>>>> [4714]: notice: LogActions: Leave resource IPaddr_172_28_185_49 >>>>>>>> (Started unmanaged) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com >>>>>>>> crmd: [4531]: info: handle_response: pe_calc calculation >>>>>>>> pe_calc-dc-1269021333-28 is obsolete Mar 19 13:55:33 >>>>>>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>>>> process_pe_message: Transition 6: PEngine Input stored in: >>>>>>>> /usr/var/lib/pengine/pe-input-338.bz2 >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: process_pe_message: Configuration WARNINGs found during PE >>>>>>>> processing. Please run "crm_verify -L" to identify issues. >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> notice: update_validation: Upgrading transitional-0.6-style >>>>>>>> configuration to pacemaker-1.0 with >>>>>>>> /usr/share/pacemaker/upgrade06.xsl Mar 19 13:55:33 >>>>>>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: >>>>>>>> update_validation: Transformation /usr/share/pacemaker/upgrade06.xsl >>>>>>>> successful >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> notice: update_validation: Upgraded from transitional-0.6 to >>>>>>>> pacemaker-1.0 validation >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> WARN: cli_config_update: Your configuration was internally updated >>>>>>>> to the latest version (pacemaker-1.0) >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, >>>>>>>> 'green' = 0 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: >>>>>>>> [4714]: info: determine_online_status: Node >>>>>>>> dbsuat1a.intranet.mydomain.com is shutting down >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> WARN: unpack_rsc_op: Processing failed op >>>>>>>> IPaddr_172_28_185_49_stop_0 on dbsuat1a.intranet.mydomain.com: >>>>>>>> unknown exec error (-2) >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: native_add_running: resource IPaddr_172_28_185_49 isnt managed >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: determine_online_status: Node dbsuat1b.intranet.mydomain.com >>>>>>>> is online Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: >>>>>>>> [4714]: notice: native_print: IPaddr_172_28_185_49 >>>>>>>> (ocf::heartbeat:IPaddr): Started dbsuat1a.intranet.mydomain.com >>>>>>>> (unmanaged) FAILED >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: get_failcount: IPaddr_172_28_185_49 has failed 1000000 times >>>>>>>> on dbsuat1a.intranet.mydomain.com >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> WARN: common_apply_stickiness: Forcing IPaddr_172_28_185_49 away >>>>>>>> from dbsuat1a.intranet.mydomain.com after 1000000 failures >>>>>>>> (max=1000000) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com >>>>>>>> pengine: [4714]: info: native_color: Unmanaged resource >>>>>>>> IPaddr_172_28_185_49 allocated to 'nowhere': failed >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for >>>>>>>> shutdown Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: >>>>>>>> [4714]: notice: LogActions: Leave resource IPaddr_172_28_185_49 >>>>>>>> (Started unmanaged) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com >>>>>>>> pengine: [4714]: info: process_pe_message: Transition 7: PEngine >>>>>>>> Input stored in: >>>>>>>> /usr/var/lib/pengine/pe-input-339.bz2 >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_state_transition: State transition S_POLICY_ENGINE -> >>>>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE >>>>>>>> origin=handle_response ] >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: process_pe_message: Configuration WARNINGs found during PE >>>>>>>> processing. Please run "crm_verify -L" to identify issues. >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> unpack_graph: Unpacked transition 7: 1 actions in 1 synapses >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_te_invoke: Processing graph 7 (ref=pe_calc-dc-1269021333-29) >>>>>>>> derived from /usr/var/lib/pengine/pe-input-339.bz2 >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> te_crm_command: Executing crm-event (10): do_shutdown on >>>>>>>> dbsuat1a.intranet.mydomain.com >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> te_crm_command: crm-event (10) is a local shutdown >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> run_graph: ==================================================== >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: >>>>>>>> run_graph: Transition 7 (Complete=1, Pending=0, Fired=0, Skipped=0, >>>>>>>> Incomplete=0, Source=/usr/var/lib/pengine/pe-input-339.bz2): >>>>>>>> Complete Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: >>>>>>>> [4531]: info: te_graph_trigger: Transition 7 is now complete >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_state_transition: State transition S_TRANSITION_ENGINE -> >>>>>>>> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ] >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_dc_release: DC role released >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> stop_subsystem: Sent -TERM to pengine: [4714] >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_te_control: Transitioner is now inactive >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: >>>>>>>> info: crm_signal_dispatch: Invoking handler for signal 15: >>>>>>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: >>>>>>>> [4531]: info: do_te_control: Disconnecting STONITH... >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> tengine_stonith_connection_destroy: Fencing daemon disconnected >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: >>>>>>>> Not currently connected. >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_shutdown: Terminating the pengine >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> stop_subsystem: Sent -TERM to pengine: [4714] >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_shutdown: Waiting for subsystems to exit >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: >>>>>>>> register_fsa_input_adv: do_shutdown stalled the FSA with pending >>>>>>>> inputs Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: >>>>>>>> info: do_shutdown: All subsystems stopped, continuing >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: >>>>>>>> do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received >>>>>>>> in state S_STOPPING >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_shutdown: Terminating the pengine >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> stop_subsystem: Sent -TERM to pengine: [4714] >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_shutdown: Waiting for subsystems to exit >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_shutdown: All subsystems stopped, continuing >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> crmdManagedChildDied: Process pengine:[4714] exited (signal=0, >>>>>>>> exitcode=0) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: >>>>>>>> [4531]: info: pe_msg_dispatch: Received HUP from pengine:[4714] >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> pe_connection_destroy: Connection to the Policy Engine released >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_shutdown: All subsystems stopped, continuing >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: ERROR: >>>>>>>> verify_stopped: Resource IPaddr_172_28_185_49 was active at >>>>>>>> shutdown. You may ignore this error if it is unmanaged. >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_lrm_control: Disconnected from the LRM >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info: >>>>>>>> client (pid=4531) removed from ccm >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_ha_control: Disconnected from Heartbeat >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_cib_control: Disconnecting CIB >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>>>> cib_process_readwrite: We are now in R/O mode >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> crmd_cib_connection_destroy: Connection to the CIB terminated... >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> free_mem: Dropping I_TERMINATE: [ state=S_STOPPING >>>>>>>> cause=C_FSA_INTERNAL origin=do_stop ] >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: >>>>>>>> do_exit: [crmd] stopped (0) >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: killing /usr/lib64/heartbeat/attrd process group 4530 with >>>>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: >>>>>>>> [4530]: info: crm_signal_dispatch: Invoking handler for signal 15: >>>>>>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: >>>>>>>> [4530]: info: attrd_shutdown: Exiting >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>>>> main: Exiting... >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: >>>>>>>> attrd_cib_connection_destroy: Connection to the CIB terminated... >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: killing /usr/lib64/heartbeat/stonithd process group 4529 with >>>>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com stonithd: >>>>>>>> [4529]: notice: /usr/lib64/heartbeat/stonithd normally quit. >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: killing /usr/lib64/heartbeat/lrmd -r process group 4528 with >>>>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com lrmd: >>>>>>>> [4528]: info: lrmd is shutting down >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: >>>>>>>> resource IPaddr_172_28_185_49 is left in UNKNOWN status.(last op >>>>>>>> stop finished without LRM_OP_DONE status.) >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: killing /usr/lib64/heartbeat/cib process group 4527 with >>>>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: >>>>>>>> [4527]: info: crm_signal_dispatch: Invoking handler for signal 15: >>>>>>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: >>>>>>>> [4527]: info: >>>>>>>> cib_shutdown: Disconnected 0 clients >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>>>> cib_process_disconnect: All clients disconnected... >>>>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>>>> initiate_exit: Sending disconnect notification to 2 peers... >>>>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>>>> cib_process_shutdown_req: Shutdown ACK from >>>>>>>> dbsuat1b.intranet.mydomain.com Mar 19 13:55:34 >>>>>>>> DBSUAT1A.intranet.mydomain.com cib: [4527]: info: terminate_cib: >>>>>>>> cib_process_shutdown_req: Disconnecting heartbeat Mar 19 13:55:34 >>>>>>>> DBSUAT1A.intranet.mydomain.com cib: [4527]: info: terminate_cib: >>>>>>>> Exiting... >>>>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>>>> cib_process_request: Operation complete: op cib_shutdown_req for >>>>>>>> section 'all' >>>>>>>> (origin=dbsuat1b.intranet.mydomain.com/dbsuat1b.intranet.mydomain.co >>>>>>>> m/ (n ull ), version=0.0.0): ok (rc=0) >>>>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>>>> ha_msg_dispatch: Lost connection to heartbeat service. >>>>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: >>>>>>>> main: Done >>>>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info: >>>>>>>> client (pid=4527) removed from ccm >>>>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: killing /usr/lib64/heartbeat/ccm process group 4526 with >>>>>>>> signal 15 Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com ccm: >>>>>>>> [4526]: info: received SIGTERM, going to shut down >>>>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: killing HBFIFO process 4522 with signal 15 >>>>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: killing HBWRITE process 4523 with signal 15 >>>>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: killing HBREAD process 4524 with signal 15 >>>>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: Core process 4524 exited. 3 remaining >>>>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: Core process 4523 exited. 2 remaining >>>>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: Core process 4522 exited. 1 remaining >>>>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: >>>>>>>> info: dbsuat1a.intranet.mydomain.com Heartbeat shutdown complete. >>>>>>>> >>>>>>>> >>>>>>>> my ha.cf >>>>>>>> # Logging >>>>>>>> debug 1 >>>>>>>> debugfile /var/log/ha-debug >>>>>>>> logfile /var/log/ha-log >>>>>>>> logfacility local0 >>>>>>>> #use_logd true >>>>>>>> #logfacility daemon >>>>>>>> >>>>>>>> # Misc Options >>>>>>>> traditional_compression off >>>>>>>> compression bz2 >>>>>>>> coredumps true >>>>>>>> >>>>>>>> # Communications >>>>>>>> udpport 691 >>>>>>>> bcast eth0 >>>>>>>> ##autojoin any >>>>>>>> autojoin none >>>>>>>> >>>>>>>> # Thresholds (in seconds) >>>>>>>> keepalive 1 >>>>>>>> warntime 6 >>>>>>>> deadtime 10 >>>>>>>> initdead 15 >>>>>>>> >>>>>>>> node dbsuat1a.intranet.mydomain.com >>>>>>>> node dbsuat1b.intranet.mydomain.com >>>>>>>> #enable pacemaker >>>>>>>> crm yes >>>>>>>> #enable STONITH >>>>>>>> #crm respawn >>>>>>>> >>>>>>>> my haresources: >>>>>>>> DBSUAT1A.intranet.mydomain.com 172.28.185.49 >>>>>>>> >>>>>>> I don't have a very good advice, but you shouldn't use haresources >>>>>>> anymore. You should use pacemaker for configuring the cluster. >>>>>>> >>>>>>> You have said that you wish to use pacemaker(crm) with this line of >>>>>>> your config: crm yes >>>>>>> >>>>>>> Remove the haresources file, restart the heartbeat on both nodes and >>>>>>> redo the tests. >>>>>>> >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> -- - >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Linux-HA mailing list >>>>>>> [email protected] >>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>> >>>>>> _______________________________________________ >>>>>> Linux-HA mailing list >>>>>> [email protected] >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>> >>>>> ----------------------------------------------------------------------- >>>>> - >>>>> >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> [email protected] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> [email protected] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> >> > > > ------------------------------------------------------------------------ > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
