I really can't understand why you are not using the CRM shell and continue with the haresources :)
To see the IP you can use this shortcut command: ip a l or ip a l eth0 There is a huge difference between IPaddr and IPaddr2. Please look here: http://www.linux-ha.org/doc/re-ra-IPaddr.html http://www.linux-ha.org/doc/re-ra-IPaddr2.html You can also look here for some crm examples: http://clusterlabs.org/wiki/Example_configurations Regards, Marian On Monday 22 March 2010 23:03:44 mike wrote: > Hi Marian, > > Success!!!! > I changed my haresources file to look like this: > DBSUAT1A.intranet.mydomain.com IPaddr2::172.28.185.49 > (so I'm using IPaddr2 now and not IPaddr) > > I removed the old cib.xml and cib.xml.sig. Generated a new cib.xml with > /usr/lib64/heartbeat/haresources2cib.py. > > Removed the haresources and enabled crm again in ha.cf and the cluster > works flawlessly. It fails back and forth as it should. The only issue > is that ifconfig does not reveal the ip address but instead I have to > use: ip addr show eth0 > > Now the question is of course - why does IPaddr2 work where IPaddr > doesn't? That's what I need to figure out..... > > Marian Marinov wrote: > > Hello mike, > > > > Your problem is pretty simple. You simply don't have configured IPaddr > > resource within pacemaker. > > > > Please look here for IPaddr: > > http://www.linux-ha.org/doc/re-ra-IPaddr.html > > And here for IPaddr2: > > http://www.linux-ha.org/doc/re-ra-IPaddr2.html > > > > You should remove the haresources file and configure pacemaker so it can > > handle your IP address. > > > > You can find basic configurations here: > > http://clusterlabs.org/wiki/Example_configurations > > http://clusterlabs.org/wiki/Example_XML_configurations > > > > Best regards, > > Marian > > > > On Monday 22 March 2010 01:25:58 mike wrote: > >> Hi Marian - I have included my cib.xml file below. > >> What I have found tonight is that by commenting out the crm entry in the > >> ha.cf and re-enabling the haresources file, I am able to fail the ip > >> back and forth at will. Here is what my haresources file looks like: > >> *DBSUAT1A.intranet.mydomain.com IPaddr::172.28.185.49* > >> > >> My cib.xml file which was generated from the above haresources file > >> using /usr/lib64/heartbeat/haresources2cib.py > >> [r...@dbsuat1b support]# cat cib.xml > >> <cib admin_epoch="0" epoch="7" validate-with="transitional-0.6" > >> crm_feature_set="3.0.1" have-quorum="1" > >> dc-uuid="e99889ee-da15-4b09-bfc7-641e3ac0687f" num_updates="0" > >> cib-last-written="Sun Mar 21 19:10:03 2010"> > >> <configuration> > >> <crm_config> > >> <cluster_property_set id="cib-bootstrap-options"> > >> <attributes> > >> <nvpair id="cib-bootstrap-options-symmetric-cluster" > >> name="symmetric-cluster" value="true"/> > >> <nvpair id="cib-bootstrap-options-no-quorum-policy" > >> name="no-quorum-policy" value="stop"/> > >> <nvpair id="cib-bootstrap-options-default-resource-stickiness" > >> name="default-resource-stickiness" value="0"/> > >> <nvpair > >> id="cib-bootstrap-options-default-resource-failure-stickiness" > >> name="default-resource-failure-stickiness" value="0"/> > >> <nvpair id="cib-bootstrap-options-stonith-enabled" > >> name="stonith-enabled" value="false"/> > >> <nvpair id="cib-bootstrap-options-stonith-action" > >> name="stonith-action" value="reboot"/> > >> <nvpair id="cib-bootstrap-options-startup-fencing" > >> name="startup-fencing" value="true"/> > >> <nvpair id="cib-bootstrap-options-stop-orphan-resources" > >> name="stop-orphan-resources" value="true"/> > >> <nvpair id="cib-bootstrap-options-stop-orphan-actions" > >> name="stop-orphan-actions" value="true"/> > >> <nvpair id="cib-bootstrap-options-remove-after-stop" > >> name="remove-after-stop" value="false"/> > >> <nvpair id="cib-bootstrap-options-short-resource-names" > >> name="short-resource-names" value="true"/> > >> <nvpair id="cib-bootstrap-options-transition-idle-timeout" > >> name="transition-idle-timeout" value="5min"/> > >> <nvpair id="cib-bootstrap-options-default-action-timeout" > >> name="default-action-timeout" value="20s"/> > >> <nvpair id="cib-bootstrap-options-is-managed-default" > >> name="is-managed-default" value="true"/> > >> <nvpair id="cib-bootstrap-options-cluster-delay" > >> name="cluster-delay" value="60s"/> > >> <nvpair id="cib-bootstrap-options-pe-error-series-max" > >> name="pe-error-series-max" value="-1"/> > >> <nvpair id="cib-bootstrap-options-pe-warn-series-max" > >> name="pe-warn-series-max" value="-1"/> > >> <nvpair id="cib-bootstrap-options-pe-input-series-max" > >> name="pe-input-series-max" value="-1"/> > >> <nvpair id="cib-bootstrap-options-dc-version" > >> name="dc-version" > >> value="1.0.6-17fe0022afda074a937d934b3eb625eccd1f01ef"/> <nvpair > >> id="cib-bootstrap-options-cluster-infrastructure" > >> name="cluster-infrastructure" value="Heartbeat"/> > >> </attributes> > >> </cluster_property_set> > >> </crm_config> > >> <nodes> > >> <node id="db80324b-c9de-4995-a66a-eedf93abb42c" > >> uname="dbsuat1a.intranet.mydomain.com" type="normal"/> > >> <node id="e99889ee-da15-4b09-bfc7-641e3ac0687f" > >> uname="dbsuat1b.intranet.mydomain.com" type="normal"/> > >> </nodes> > >> <resources> > >> <primitive class="ocf" id="IPaddr_172_28_185_49" > >> provider="heartbeat" type="IPaddr"> > >> <operations> > >> <op id="IPaddr_172_28_185_49_mon" interval="5s" name="monitor" > >> timeout="5s"/> > >> </operations> > >> <instance_attributes id="IPaddr_172_28_185_49_inst_attr"> > >> <attributes> > >> <nvpair id="IPaddr_172_28_185_49_attr_0" name="ip" > >> value="172.28.185.49"/> > >> </attributes> > >> </instance_attributes> > >> </primitive> > >> </resources> > >> <constraints> > >> <rsc_location id="rsc_location_IPaddr_172_28_185_49" > >> rsc="IPaddr_172_28_185_49"> > >> <rule id="prefered_location_IPaddr_172_28_185_49" score="100"> > >> <expression attribute="#uname" > >> id="prefered_location_IPaddr_172_28_185_49_expr" operation="eq" > >> value="DBSUAT1A.intranet.mydomain.com"/> > >> </rule> > >> </rsc_location> > >> </constraints> > >> </configuration> > >> </cib> > >> > >> Marian Marinov wrote: > >>> Can you please give us your crm configuration ? > >>> > >>> Marian > >>> > >>> On Sunday 21 March 2010 23:30:46 mike wrote: > >>>> Thank you Marian. I removed th efile as you suggested but > >>>> unfortunately it has made no difference. The ip address is simply not > >>>> being released when I stop the heartbeat process. > >>>> > >>>> Anyone have an ideas where I could start to look at this? The only way > >>>> I can get the ip address released is to reboot the node. > >>>> > >>>> thanks > >>>> > >>>> Marian Marinov wrote: > >>>>> On Saturday 20 March 2010 03:56:27 mike wrote: > >>>>>> Hi guys, > >>>>>> > >>>>>> I have a simple 2 node cluster with a VIP running on RHEL 5.3 on > >>>>>> s390. Nothing else configured yet. > >>>>>> > >>>>>> When I start up the cluster, all is well. The VIP starts up on the > >>>>>> home node and crm_mon shows the resource and nodes as on line. No > >>>>>> errors in the logs. > >>>>>> > >>>>>> If I issue service heartbeat stop on the main node, the ip fails > >>>>>> over to the back up node and crm_mon shows as I would expect it > >>>>>> should, i.e. the ip address is on the back up node and that the > >>>>>> other node is offline. However, if I do a ifconfig on the main node > >>>>>> I see that the eth0:0 entry is still there so in effect the vip > >>>>>> address is now on both servers. > >>>>>> > >>>>>> If both nodes were up and running and I rebooted the main node then > >>>>>> the failover works perfectly. > >>>>>> > >>>>>> Would anyone know why the nodes seem unable to release the vip > >>>>>> unless rebooted? > >>>>>> > >>>>>> ha-log: > >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for > >>>>>> shutdown Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: > >>>>>> [4714]: notice: LogActions: Move resource IPaddr_172_28_185_49 > >>>>>> (Started dbsuat1a.intranet.mydomain.com -> > >>>>>> dbsuat1b.intranet.mydomain.com) Mar 19 13:55:12 > >>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_state_transition: State transition S_POLICY_ENGINE -> > >>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE > >>>>>> origin=handle_response ] > >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: process_pe_message: Transition 5: PEngine Input stored in: > >>>>>> /usr/var/lib/pengine/pe-input-337.bz2 > >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: process_pe_message: Configuration WARNINGs found during PE > >>>>>> processing. Please run "crm_verify -L" to identify issues. > >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> unpack_graph: Unpacked transition 5: 5 actions in 5 synapses > >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_te_invoke: Processing graph 5 (ref=pe_calc-dc-1269021312-26) > >>>>>> derived from /usr/var/lib/pengine/pe-input-337.bz2 > >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> te_rsc_command: Initiating action 6: stop > >>>>>> IPaddr_172_28_185_49_stop_0 on dbsuat1a.intranet.mydomain.com > >>>>>> (local) > >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_lrm_rsc_op: Performing > >>>>>> key=6:5:0:888fa84e-3267-409e-966b-2ab01e579c0f > >>>>>> op=IPaddr_172_28_185_49_stop_0 ) > >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: info: > >>>>>> rsc:IPaddr_172_28_185_49:5: stop > >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> process_lrm_event: LRM operation IPaddr_172_28_185_49_monitor_5000 > >>>>>> (call=4, status=1, cib-update=0, confirmed=true) Cancelled > >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: > >>>>>> IPaddr_172_28_185_49:stop process (PID 5474) timed out (try 1). > >>>>>> Killing with signal SIGTERM (15). > >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: > >>>>>> Managed IPaddr_172_28_185_49:stop process 5474 killed by signal 15 > >>>>>> [SIGTERM - Termination (ANSI)]. > >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: > >>>>>> operation stop[5] on ocf::IPaddr::IPaddr_172_28_185_49 for client > >>>>>> 4531, its parameters: ip=[172.28.185.49] CRM_meta_timeout=[20000] > >>>>>> crm_feature_set=[3.0.1] : pid [5474] timed out > >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com crmd: [4531]: ERROR: > >>>>>> process_lrm_event: LRM operation IPaddr_172_28_185_49_stop_0 (5) > >>>>>> Timed Out (timeout=20000ms) > >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: > >>>>>> status_from_rc: Action 6 (IPaddr_172_28_185_49_stop_0) on > >>>>>> dbsuat1a.intranet.mydomain.com failed (target: 0 vs. rc: -2): Error > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: > >>>>>> update_failcount: Updating failcount for IPaddr_172_28_185_49 on > >>>>>> dbsuat1a.intranet.mydomain.com after failed stop: rc=-2 > >>>>>> (update=INFINITY, time=1269021333) > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> abort_transition_graph: match_graph_event:272 - Triggered transition > >>>>>> abort (complete=0, tag=lrm_rsc_op, id=IPaddr_172_28_185_49_stop_0, > >>>>>> magic=2:-2;6:5:0:888fa84e-3267-409e-966b-2ab01e579c0f, cib=0.23.16) > >>>>>> : Event failed > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> update_abort_priority: Abort priority upgraded from 0 to 1 > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> update_abort_priority: Abort action done superceeded by restart > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> match_graph_event: Action IPaddr_172_28_185_49_stop_0 (6) confirmed > >>>>>> on dbsuat1a.intranet.mydomain.com (rc=4) > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> run_graph: ==================================================== > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: > >>>>>> run_graph: Transition 5 (Complete=1, Pending=0, Fired=0, Skipped=4, > >>>>>> Incomplete=0, Source=/usr/var/lib/pengine/pe-input-337.bz2): Stopped > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> te_graph_trigger: Transition 5 is now complete > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>>>> find_hash_entry: Creating hash entry for > >>>>>> fail-count-IPaddr_172_28_185_49 Mar 19 13:55:33 > >>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_state_transition: State transition S_TRANSITION_ENGINE -> > >>>>>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL > >>>>>> origin=notify_crmd ] Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com > >>>>>> crmd: [4531]: info: do_state_transition: All 2 cluster nodes are > >>>>>> eligible to run resources. Mar 19 13:55:33 > >>>>>> DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>>>> attrd_trigger_update: Sending flush op to all hosts for: > >>>>>> fail-count-IPaddr_172_28_185_49 (INFINITY) > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>>>> attrd_perform_update: Sent update 24: > >>>>>> fail-count-IPaddr_172_28_185_49=INFINITY > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_pe_invoke: Query 53: Requesting the current CIB: S_POLICY_ENGINE > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>>>> find_hash_entry: Creating hash entry for > >>>>>> last-failure-IPaddr_172_28_185_49 Mar 19 13:55:33 > >>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> abort_transition_graph: te_update_diff:146 - Triggered transition > >>>>>> abort (complete=1, tag=transient_attributes, > >>>>>> id=db80324b-c9de-4995-a66a-eedf93abb42c, magic=NA, cib=0.23.17) : > >>>>>> Transient attribute: update > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>>>> attrd_trigger_update: Sending flush op to all hosts for: > >>>>>> last-failure-IPaddr_172_28_185_49 (1269021333) > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>>>> attrd_perform_update: Sent update 27: > >>>>>> last-failure-IPaddr_172_28_185_49=1269021333 > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_pe_invoke_callback: Invoking the PE: > >>>>>> ref=pe_calc-dc-1269021333-28, seq=2, quorate=1 > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> abort_transition_graph: te_update_diff:146 - Triggered transition > >>>>>> abort (complete=1, tag=transient_attributes, > >>>>>> id=db80324b-c9de-4995-a66a-eedf93abb42c, magic=NA, cib=0.23.18) : > >>>>>> Transient attribute: update > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> notice: update_validation: Upgrading transitional-0.6-style > >>>>>> configuration to pacemaker-1.0 with > >>>>>> /usr/share/pacemaker/upgrade06.xsl Mar 19 13:55:33 > >>>>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>>>> update_validation: Transformation /usr/share/pacemaker/upgrade06.xsl > >>>>>> successful > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> notice: update_validation: Upgraded from transitional-0.6 to > >>>>>> pacemaker-1.0 validation > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> WARN: cli_config_update: Your configuration was internally updated > >>>>>> to the latest version (pacemaker-1.0) > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_pe_invoke: Query 54: Requesting the current CIB: S_POLICY_ENGINE > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_pe_invoke: Query 55: Requesting the current CIB: S_POLICY_ENGINE > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_pe_invoke_callback: Invoking the PE: > >>>>>> ref=pe_calc-dc-1269021333-29, seq=2, quorate=1 > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, > >>>>>> 'green' = 0 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: > >>>>>> [4714]: info: determine_online_status: Node > >>>>>> dbsuat1a.intranet.mydomain.com is shutting down > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> WARN: unpack_rsc_op: Processing failed op > >>>>>> IPaddr_172_28_185_49_stop_0 on dbsuat1a.intranet.mydomain.com: > >>>>>> unknown exec error (-2) > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: native_add_running: resource IPaddr_172_28_185_49 isnt managed > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: determine_online_status: Node dbsuat1b.intranet.mydomain.com > >>>>>> is online Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: > >>>>>> [4714]: notice: native_print: IPaddr_172_28_185_49 > >>>>>> (ocf::heartbeat:IPaddr): Started dbsuat1a.intranet.mydomain.com > >>>>>> (unmanaged) FAILED > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: get_failcount: IPaddr_172_28_185_49 has failed 1000000 times > >>>>>> on dbsuat1a.intranet.mydomain.com > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> WARN: common_apply_stickiness: Forcing IPaddr_172_28_185_49 away > >>>>>> from dbsuat1a.intranet.mydomain.com after 1000000 failures > >>>>>> (max=1000000) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com > >>>>>> pengine: [4714]: info: native_color: Unmanaged resource > >>>>>> IPaddr_172_28_185_49 allocated to 'nowhere': failed > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for > >>>>>> shutdown Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: > >>>>>> [4714]: notice: LogActions: Leave resource IPaddr_172_28_185_49 > >>>>>> (Started unmanaged) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com > >>>>>> crmd: [4531]: info: handle_response: pe_calc calculation > >>>>>> pe_calc-dc-1269021333-28 is obsolete Mar 19 13:55:33 > >>>>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>>>> process_pe_message: Transition 6: PEngine Input stored in: > >>>>>> /usr/var/lib/pengine/pe-input-338.bz2 > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: process_pe_message: Configuration WARNINGs found during PE > >>>>>> processing. Please run "crm_verify -L" to identify issues. > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> notice: update_validation: Upgrading transitional-0.6-style > >>>>>> configuration to pacemaker-1.0 with > >>>>>> /usr/share/pacemaker/upgrade06.xsl Mar 19 13:55:33 > >>>>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info: > >>>>>> update_validation: Transformation /usr/share/pacemaker/upgrade06.xsl > >>>>>> successful > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> notice: update_validation: Upgraded from transitional-0.6 to > >>>>>> pacemaker-1.0 validation > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> WARN: cli_config_update: Your configuration was internally updated > >>>>>> to the latest version (pacemaker-1.0) > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0, > >>>>>> 'green' = 0 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: > >>>>>> [4714]: info: determine_online_status: Node > >>>>>> dbsuat1a.intranet.mydomain.com is shutting down > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> WARN: unpack_rsc_op: Processing failed op > >>>>>> IPaddr_172_28_185_49_stop_0 on dbsuat1a.intranet.mydomain.com: > >>>>>> unknown exec error (-2) > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: native_add_running: resource IPaddr_172_28_185_49 isnt managed > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: determine_online_status: Node dbsuat1b.intranet.mydomain.com > >>>>>> is online Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: > >>>>>> [4714]: notice: native_print: IPaddr_172_28_185_49 > >>>>>> (ocf::heartbeat:IPaddr): Started dbsuat1a.intranet.mydomain.com > >>>>>> (unmanaged) FAILED > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: get_failcount: IPaddr_172_28_185_49 has failed 1000000 times > >>>>>> on dbsuat1a.intranet.mydomain.com > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> WARN: common_apply_stickiness: Forcing IPaddr_172_28_185_49 away > >>>>>> from dbsuat1a.intranet.mydomain.com after 1000000 failures > >>>>>> (max=1000000) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com > >>>>>> pengine: [4714]: info: native_color: Unmanaged resource > >>>>>> IPaddr_172_28_185_49 allocated to 'nowhere': failed > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for > >>>>>> shutdown Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: > >>>>>> [4714]: notice: LogActions: Leave resource IPaddr_172_28_185_49 > >>>>>> (Started unmanaged) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com > >>>>>> pengine: [4714]: info: process_pe_message: Transition 7: PEngine > >>>>>> Input stored in: > >>>>>> /usr/var/lib/pengine/pe-input-339.bz2 > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_state_transition: State transition S_POLICY_ENGINE -> > >>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE > >>>>>> origin=handle_response ] > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: process_pe_message: Configuration WARNINGs found during PE > >>>>>> processing. Please run "crm_verify -L" to identify issues. > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> unpack_graph: Unpacked transition 7: 1 actions in 1 synapses > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_te_invoke: Processing graph 7 (ref=pe_calc-dc-1269021333-29) > >>>>>> derived from /usr/var/lib/pengine/pe-input-339.bz2 > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> te_crm_command: Executing crm-event (10): do_shutdown on > >>>>>> dbsuat1a.intranet.mydomain.com > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> te_crm_command: crm-event (10) is a local shutdown > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> run_graph: ==================================================== > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: > >>>>>> run_graph: Transition 7 (Complete=1, Pending=0, Fired=0, Skipped=0, > >>>>>> Incomplete=0, Source=/usr/var/lib/pengine/pe-input-339.bz2): > >>>>>> Complete Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: > >>>>>> [4531]: info: te_graph_trigger: Transition 7 is now complete > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_state_transition: State transition S_TRANSITION_ENGINE -> > >>>>>> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ] > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_dc_release: DC role released > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> stop_subsystem: Sent -TERM to pengine: [4714] > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_te_control: Transitioner is now inactive > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]: > >>>>>> info: crm_signal_dispatch: Invoking handler for signal 15: > >>>>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: > >>>>>> [4531]: info: do_te_control: Disconnecting STONITH... > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> tengine_stonith_connection_destroy: Fencing daemon disconnected > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice: > >>>>>> Not currently connected. > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_shutdown: Terminating the pengine > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> stop_subsystem: Sent -TERM to pengine: [4714] > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_shutdown: Waiting for subsystems to exit > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: > >>>>>> register_fsa_input_adv: do_shutdown stalled the FSA with pending > >>>>>> inputs Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: > >>>>>> info: do_shutdown: All subsystems stopped, continuing > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN: > >>>>>> do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received > >>>>>> in state S_STOPPING > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_shutdown: Terminating the pengine > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> stop_subsystem: Sent -TERM to pengine: [4714] > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_shutdown: Waiting for subsystems to exit > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_shutdown: All subsystems stopped, continuing > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> crmdManagedChildDied: Process pengine:[4714] exited (signal=0, > >>>>>> exitcode=0) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: > >>>>>> [4531]: info: pe_msg_dispatch: Received HUP from pengine:[4714] > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> pe_connection_destroy: Connection to the Policy Engine released > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_shutdown: All subsystems stopped, continuing > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: ERROR: > >>>>>> verify_stopped: Resource IPaddr_172_28_185_49 was active at > >>>>>> shutdown. You may ignore this error if it is unmanaged. > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_lrm_control: Disconnected from the LRM > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info: > >>>>>> client (pid=4531) removed from ccm > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_ha_control: Disconnected from Heartbeat > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_cib_control: Disconnecting CIB > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>>>> cib_process_readwrite: We are now in R/O mode > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> crmd_cib_connection_destroy: Connection to the CIB terminated... > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> free_mem: Dropping I_TERMINATE: [ state=S_STOPPING > >>>>>> cause=C_FSA_INTERNAL origin=do_stop ] > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info: > >>>>>> do_exit: [crmd] stopped (0) > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: killing /usr/lib64/heartbeat/attrd process group 4530 with > >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: > >>>>>> [4530]: info: crm_signal_dispatch: Invoking handler for signal 15: > >>>>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: > >>>>>> [4530]: info: attrd_shutdown: Exiting > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>>>> main: Exiting... > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info: > >>>>>> attrd_cib_connection_destroy: Connection to the CIB terminated... > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: killing /usr/lib64/heartbeat/stonithd process group 4529 with > >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com stonithd: > >>>>>> [4529]: notice: /usr/lib64/heartbeat/stonithd normally quit. > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: killing /usr/lib64/heartbeat/lrmd -r process group 4528 with > >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com lrmd: > >>>>>> [4528]: info: lrmd is shutting down > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN: > >>>>>> resource IPaddr_172_28_185_49 is left in UNKNOWN status.(last op > >>>>>> stop finished without LRM_OP_DONE status.) > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: killing /usr/lib64/heartbeat/cib process group 4527 with > >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: > >>>>>> [4527]: info: crm_signal_dispatch: Invoking handler for signal 15: > >>>>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: > >>>>>> [4527]: info: > >>>>>> cib_shutdown: Disconnected 0 clients > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>>>> cib_process_disconnect: All clients disconnected... > >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>>>> initiate_exit: Sending disconnect notification to 2 peers... > >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>>>> cib_process_shutdown_req: Shutdown ACK from > >>>>>> dbsuat1b.intranet.mydomain.com Mar 19 13:55:34 > >>>>>> DBSUAT1A.intranet.mydomain.com cib: [4527]: info: terminate_cib: > >>>>>> cib_process_shutdown_req: Disconnecting heartbeat Mar 19 13:55:34 > >>>>>> DBSUAT1A.intranet.mydomain.com cib: [4527]: info: terminate_cib: > >>>>>> Exiting... > >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>>>> cib_process_request: Operation complete: op cib_shutdown_req for > >>>>>> section 'all' > >>>>>> (origin=dbsuat1b.intranet.mydomain.com/dbsuat1b.intranet.mydomain.co > >>>>>>m/ (n ull ), version=0.0.0): ok (rc=0) > >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>>>> ha_msg_dispatch: Lost connection to heartbeat service. > >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info: > >>>>>> main: Done > >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info: > >>>>>> client (pid=4527) removed from ccm > >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: killing /usr/lib64/heartbeat/ccm process group 4526 with > >>>>>> signal 15 Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com ccm: > >>>>>> [4526]: info: received SIGTERM, going to shut down > >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: killing HBFIFO process 4522 with signal 15 > >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: killing HBWRITE process 4523 with signal 15 > >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: killing HBREAD process 4524 with signal 15 > >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: Core process 4524 exited. 3 remaining > >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: Core process 4523 exited. 2 remaining > >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: Core process 4522 exited. 1 remaining > >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]: > >>>>>> info: dbsuat1a.intranet.mydomain.com Heartbeat shutdown complete. > >>>>>> > >>>>>> > >>>>>> my ha.cf > >>>>>> # Logging > >>>>>> debug 1 > >>>>>> debugfile /var/log/ha-debug > >>>>>> logfile /var/log/ha-log > >>>>>> logfacility local0 > >>>>>> #use_logd true > >>>>>> #logfacility daemon > >>>>>> > >>>>>> # Misc Options > >>>>>> traditional_compression off > >>>>>> compression bz2 > >>>>>> coredumps true > >>>>>> > >>>>>> # Communications > >>>>>> udpport 691 > >>>>>> bcast eth0 > >>>>>> ##autojoin any > >>>>>> autojoin none > >>>>>> > >>>>>> # Thresholds (in seconds) > >>>>>> keepalive 1 > >>>>>> warntime 6 > >>>>>> deadtime 10 > >>>>>> initdead 15 > >>>>>> > >>>>>> node dbsuat1a.intranet.mydomain.com > >>>>>> node dbsuat1b.intranet.mydomain.com > >>>>>> #enable pacemaker > >>>>>> crm yes > >>>>>> #enable STONITH > >>>>>> #crm respawn > >>>>>> > >>>>>> my haresources: > >>>>>> DBSUAT1A.intranet.mydomain.com 172.28.185.49 > >>>>> > >>>>> I don't have a very good advice, but you shouldn't use haresources > >>>>> anymore. You should use pacemaker for configuring the cluster. > >>>>> > >>>>> You have said that you wish to use pacemaker(crm) with this line of > >>>>> your config: crm yes > >>>>> > >>>>> Remove the haresources file, restart the heartbeat on both nodes and > >>>>> redo the tests. > >>>>> > >>>>> > >>>>> --------------------------------------------------------------------- > >>>>>-- - > >>>>> > >>>>> _______________________________________________ > >>>>> Linux-HA mailing list > >>>>> [email protected] > >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>>>> See also: http://linux-ha.org/ReportingProblems > >>>> > >>>> _______________________________________________ > >>>> Linux-HA mailing list > >>>> [email protected] > >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>>> See also: http://linux-ha.org/ReportingProblems > >>> > >>> ----------------------------------------------------------------------- > >>>- > >>> > >>> _______________________________________________ > >>> Linux-HA mailing list > >>> [email protected] > >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>> See also: http://linux-ha.org/ReportingProblems > >> > >> _______________________________________________ > >> Linux-HA mailing list > >> [email protected] > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Best regards, Marian Marinov
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
