Re: [Linux-HA] Simple 2 node cluster wont release ip

Marian Marinov Mon, 22 Mar 2010 14:11:22 -0700

I really can't understand why you are not using the CRM shell and continue 
with the haresources :)


To see the IP you can use this shortcut command: ip a l
or ip a l eth0

There is a huge difference between IPaddr and IPaddr2.

Please look here:
http://www.linux-ha.org/doc/re-ra-IPaddr.html
http://www.linux-ha.org/doc/re-ra-IPaddr2.html

You can also look here for some crm examples:
http://clusterlabs.org/wiki/Example_configurations

Regards,
Marian

On Monday 22 March 2010 23:03:44 mike wrote:
> Hi Marian,
> 
> Success!!!!
> I changed my haresources file to look like this:
> DBSUAT1A.intranet.mydomain.com IPaddr2::172.28.185.49
> (so I'm using IPaddr2 now and not IPaddr)
> 
> I removed the old cib.xml and cib.xml.sig. Generated a new cib.xml with
> /usr/lib64/heartbeat/haresources2cib.py.
> 
> Removed the haresources and enabled crm again in ha.cf and the cluster
> works flawlessly. It fails back and forth as it should. The only issue
> is that ifconfig does not reveal the ip address but instead I have to
> use: ip addr show eth0
> 
> Now the question is of course - why does IPaddr2 work where IPaddr
> doesn't? That's what I need to figure out.....
> 
> Marian Marinov wrote:
> > Hello mike,
> >
> > Your problem is pretty simple. You simply don't have configured IPaddr
> > resource within pacemaker.
> >
> > Please look here for IPaddr:
> > http://www.linux-ha.org/doc/re-ra-IPaddr.html
> > And here for IPaddr2:
> > http://www.linux-ha.org/doc/re-ra-IPaddr2.html
> >
> > You should remove the haresources file and configure pacemaker so it can
> > handle your IP address.
> >
> > You can find basic configurations here:
> > http://clusterlabs.org/wiki/Example_configurations
> > http://clusterlabs.org/wiki/Example_XML_configurations
> >
> > Best regards,
> > Marian
> >
> > On Monday 22 March 2010 01:25:58 mike wrote:
> >> Hi Marian - I have included my cib.xml file below.
> >> What I have found tonight is that by commenting out the crm entry in the
> >> ha.cf and re-enabling the haresources file, I am able to fail the ip
> >> back and forth at will. Here is what my haresources file looks like:
> >> *DBSUAT1A.intranet.mydomain.com IPaddr::172.28.185.49*
> >>
> >> My cib.xml file which was generated from the above haresources file
> >> using /usr/lib64/heartbeat/haresources2cib.py
> >> [r...@dbsuat1b support]# cat cib.xml
> >> <cib admin_epoch="0" epoch="7" validate-with="transitional-0.6"
> >> crm_feature_set="3.0.1" have-quorum="1"
> >> dc-uuid="e99889ee-da15-4b09-bfc7-641e3ac0687f" num_updates="0"
> >> cib-last-written="Sun Mar 21 19:10:03 2010">
> >>   <configuration>
> >>     <crm_config>
> >>       <cluster_property_set id="cib-bootstrap-options">
> >>         <attributes>
> >>           <nvpair id="cib-bootstrap-options-symmetric-cluster"
> >> name="symmetric-cluster" value="true"/>
> >>           <nvpair id="cib-bootstrap-options-no-quorum-policy"
> >> name="no-quorum-policy" value="stop"/>
> >>           <nvpair id="cib-bootstrap-options-default-resource-stickiness"
> >> name="default-resource-stickiness" value="0"/>
> >>           <nvpair
> >> id="cib-bootstrap-options-default-resource-failure-stickiness"
> >> name="default-resource-failure-stickiness" value="0"/>
> >>           <nvpair id="cib-bootstrap-options-stonith-enabled"
> >> name="stonith-enabled" value="false"/>
> >>           <nvpair id="cib-bootstrap-options-stonith-action"
> >> name="stonith-action" value="reboot"/>
> >>           <nvpair id="cib-bootstrap-options-startup-fencing"
> >> name="startup-fencing" value="true"/>
> >>           <nvpair id="cib-bootstrap-options-stop-orphan-resources"
> >> name="stop-orphan-resources" value="true"/>
> >>           <nvpair id="cib-bootstrap-options-stop-orphan-actions"
> >> name="stop-orphan-actions" value="true"/>
> >>           <nvpair id="cib-bootstrap-options-remove-after-stop"
> >> name="remove-after-stop" value="false"/>
> >>           <nvpair id="cib-bootstrap-options-short-resource-names"
> >> name="short-resource-names" value="true"/>
> >>           <nvpair id="cib-bootstrap-options-transition-idle-timeout"
> >> name="transition-idle-timeout" value="5min"/>
> >>           <nvpair id="cib-bootstrap-options-default-action-timeout"
> >> name="default-action-timeout" value="20s"/>
> >>           <nvpair id="cib-bootstrap-options-is-managed-default"
> >> name="is-managed-default" value="true"/>
> >>           <nvpair id="cib-bootstrap-options-cluster-delay"
> >> name="cluster-delay" value="60s"/>
> >>           <nvpair id="cib-bootstrap-options-pe-error-series-max"
> >> name="pe-error-series-max" value="-1"/>
> >>           <nvpair id="cib-bootstrap-options-pe-warn-series-max"
> >> name="pe-warn-series-max" value="-1"/>
> >>           <nvpair id="cib-bootstrap-options-pe-input-series-max"
> >> name="pe-input-series-max" value="-1"/>
> >>           <nvpair id="cib-bootstrap-options-dc-version"
> >> name="dc-version"
> >> value="1.0.6-17fe0022afda074a937d934b3eb625eccd1f01ef"/> <nvpair
> >> id="cib-bootstrap-options-cluster-infrastructure"
> >> name="cluster-infrastructure" value="Heartbeat"/>
> >>         </attributes>
> >>       </cluster_property_set>
> >>     </crm_config>
> >>     <nodes>
> >>       <node id="db80324b-c9de-4995-a66a-eedf93abb42c"
> >> uname="dbsuat1a.intranet.mydomain.com" type="normal"/>
> >>       <node id="e99889ee-da15-4b09-bfc7-641e3ac0687f"
> >> uname="dbsuat1b.intranet.mydomain.com" type="normal"/>
> >>     </nodes>
> >>     <resources>
> >>       <primitive class="ocf" id="IPaddr_172_28_185_49"
> >> provider="heartbeat" type="IPaddr">
> >>         <operations>
> >>           <op id="IPaddr_172_28_185_49_mon" interval="5s" name="monitor"
> >> timeout="5s"/>
> >>         </operations>
> >>         <instance_attributes id="IPaddr_172_28_185_49_inst_attr">
> >>           <attributes>
> >>             <nvpair id="IPaddr_172_28_185_49_attr_0" name="ip"
> >> value="172.28.185.49"/>
> >>           </attributes>
> >>         </instance_attributes>
> >>       </primitive>
> >>     </resources>
> >>     <constraints>
> >>       <rsc_location id="rsc_location_IPaddr_172_28_185_49"
> >> rsc="IPaddr_172_28_185_49">
> >>         <rule id="prefered_location_IPaddr_172_28_185_49" score="100">
> >>           <expression attribute="#uname"
> >> id="prefered_location_IPaddr_172_28_185_49_expr" operation="eq"
> >> value="DBSUAT1A.intranet.mydomain.com"/>
> >>         </rule>
> >>       </rsc_location>
> >>     </constraints>
> >>   </configuration>
> >> </cib>
> >>
> >> Marian Marinov wrote:
> >>> Can you please give us your crm configuration ?
> >>>
> >>> Marian
> >>>
> >>> On Sunday 21 March 2010 23:30:46 mike wrote:
> >>>> Thank you Marian. I removed th efile as you suggested but
> >>>> unfortunately it has made no difference. The ip address is simply not
> >>>> being released when I stop the heartbeat process.
> >>>>
> >>>> Anyone have an ideas where I could start to look at this? The only way
> >>>> I can get the ip address released is to reboot the node.
> >>>>
> >>>> thanks
> >>>>
> >>>> Marian Marinov wrote:
> >>>>> On Saturday 20 March 2010 03:56:27 mike wrote:
> >>>>>> Hi guys,
> >>>>>>
> >>>>>> I have a simple 2 node cluster with a VIP running on RHEL 5.3 on
> >>>>>> s390. Nothing else configured yet.
> >>>>>>
> >>>>>> When I start up the cluster, all is well. The VIP starts up on the
> >>>>>> home node and crm_mon shows the resource and nodes as on line. No
> >>>>>> errors in the logs.
> >>>>>>
> >>>>>> If I issue service heartbeat stop on the main node, the ip fails
> >>>>>> over to the back up node and crm_mon shows as I would expect it
> >>>>>> should, i.e. the ip address is on the back up node and that the
> >>>>>> other node is offline. However, if I do a ifconfig on the main node
> >>>>>> I see that the eth0:0 entry is still there so in effect the vip
> >>>>>> address is now on both servers.
> >>>>>>
> >>>>>> If both nodes were up and running and I rebooted the main node then
> >>>>>> the failover works perfectly.
> >>>>>>
> >>>>>> Would anyone know why the nodes seem unable to release the vip
> >>>>>> unless rebooted?
> >>>>>>
> >>>>>> ha-log:
> >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for
> >>>>>> shutdown Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine:
> >>>>>> [4714]: notice: LogActions: Move resource IPaddr_172_28_185_49   
> >>>>>> (Started dbsuat1a.intranet.mydomain.com ->
> >>>>>> dbsuat1b.intranet.mydomain.com) Mar 19 13:55:12
> >>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_state_transition: State transition S_POLICY_ENGINE ->
> >>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> >>>>>> origin=handle_response ]
> >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: process_pe_message: Transition 5: PEngine Input stored in:
> >>>>>> /usr/var/lib/pengine/pe-input-337.bz2
> >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: process_pe_message: Configuration WARNINGs found during PE
> >>>>>> processing. Please run "crm_verify -L" to identify issues.
> >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> unpack_graph: Unpacked transition 5: 5 actions in 5 synapses
> >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_te_invoke: Processing graph 5 (ref=pe_calc-dc-1269021312-26)
> >>>>>> derived from /usr/var/lib/pengine/pe-input-337.bz2
> >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> te_rsc_command: Initiating action 6: stop
> >>>>>> IPaddr_172_28_185_49_stop_0 on dbsuat1a.intranet.mydomain.com
> >>>>>> (local)
> >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_lrm_rsc_op: Performing
> >>>>>> key=6:5:0:888fa84e-3267-409e-966b-2ab01e579c0f
> >>>>>> op=IPaddr_172_28_185_49_stop_0 )
> >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: info:
> >>>>>> rsc:IPaddr_172_28_185_49:5: stop
> >>>>>> Mar 19 13:55:12 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> process_lrm_event: LRM operation IPaddr_172_28_185_49_monitor_5000
> >>>>>> (call=4, status=1, cib-update=0, confirmed=true) Cancelled
> >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN:
> >>>>>> IPaddr_172_28_185_49:stop process (PID 5474) timed out (try 1).
> >>>>>> Killing with signal SIGTERM (15).
> >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN:
> >>>>>> Managed IPaddr_172_28_185_49:stop process 5474 killed by signal 15
> >>>>>> [SIGTERM - Termination (ANSI)].
> >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN:
> >>>>>> operation stop[5] on ocf::IPaddr::IPaddr_172_28_185_49 for client
> >>>>>> 4531, its parameters: ip=[172.28.185.49] CRM_meta_timeout=[20000]
> >>>>>> crm_feature_set=[3.0.1] : pid [5474] timed out
> >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com crmd: [4531]: ERROR:
> >>>>>> process_lrm_event: LRM operation IPaddr_172_28_185_49_stop_0 (5)
> >>>>>> Timed Out (timeout=20000ms)
> >>>>>> Mar 19 13:55:32 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN:
> >>>>>> status_from_rc: Action 6 (IPaddr_172_28_185_49_stop_0) on
> >>>>>> dbsuat1a.intranet.mydomain.com failed (target: 0 vs. rc: -2): Error
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN:
> >>>>>> update_failcount: Updating failcount for IPaddr_172_28_185_49 on
> >>>>>> dbsuat1a.intranet.mydomain.com after failed stop: rc=-2
> >>>>>> (update=INFINITY, time=1269021333)
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> abort_transition_graph: match_graph_event:272 - Triggered transition
> >>>>>> abort (complete=0, tag=lrm_rsc_op, id=IPaddr_172_28_185_49_stop_0,
> >>>>>> magic=2:-2;6:5:0:888fa84e-3267-409e-966b-2ab01e579c0f, cib=0.23.16)
> >>>>>> : Event failed
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> update_abort_priority: Abort priority upgraded from 0 to 1
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> update_abort_priority: Abort action done superceeded by restart
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> match_graph_event: Action IPaddr_172_28_185_49_stop_0 (6) confirmed
> >>>>>> on dbsuat1a.intranet.mydomain.com (rc=4)
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> run_graph: ====================================================
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice:
> >>>>>> run_graph: Transition 5 (Complete=1, Pending=0, Fired=0, Skipped=4,
> >>>>>> Incomplete=0, Source=/usr/var/lib/pengine/pe-input-337.bz2): Stopped
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> te_graph_trigger: Transition 5 is now complete
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info:
> >>>>>> find_hash_entry: Creating hash entry for
> >>>>>> fail-count-IPaddr_172_28_185_49 Mar 19 13:55:33
> >>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_state_transition: State transition S_TRANSITION_ENGINE ->
> >>>>>> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> >>>>>> origin=notify_crmd ] Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com
> >>>>>> crmd: [4531]: info: do_state_transition: All 2 cluster nodes are
> >>>>>> eligible to run resources. Mar 19 13:55:33
> >>>>>> DBSUAT1A.intranet.mydomain.com attrd: [4530]: info:
> >>>>>> attrd_trigger_update: Sending flush op to all hosts for:
> >>>>>> fail-count-IPaddr_172_28_185_49 (INFINITY)
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info:
> >>>>>> attrd_perform_update: Sent update 24:
> >>>>>> fail-count-IPaddr_172_28_185_49=INFINITY
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_pe_invoke: Query 53: Requesting the current CIB: S_POLICY_ENGINE
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info:
> >>>>>> find_hash_entry: Creating hash entry for
> >>>>>> last-failure-IPaddr_172_28_185_49 Mar 19 13:55:33
> >>>>>> DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> abort_transition_graph: te_update_diff:146 - Triggered transition
> >>>>>> abort (complete=1, tag=transient_attributes,
> >>>>>> id=db80324b-c9de-4995-a66a-eedf93abb42c, magic=NA, cib=0.23.17) :
> >>>>>> Transient attribute: update
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info:
> >>>>>> attrd_trigger_update: Sending flush op to all hosts for:
> >>>>>> last-failure-IPaddr_172_28_185_49 (1269021333)
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info:
> >>>>>> attrd_perform_update: Sent update 27:
> >>>>>> last-failure-IPaddr_172_28_185_49=1269021333
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_pe_invoke_callback: Invoking the PE:
> >>>>>> ref=pe_calc-dc-1269021333-28, seq=2, quorate=1
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> abort_transition_graph: te_update_diff:146 - Triggered transition
> >>>>>> abort (complete=1, tag=transient_attributes,
> >>>>>> id=db80324b-c9de-4995-a66a-eedf93abb42c, magic=NA, cib=0.23.18) :
> >>>>>> Transient attribute: update
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> notice: update_validation: Upgrading transitional-0.6-style
> >>>>>> configuration to pacemaker-1.0 with
> >>>>>> /usr/share/pacemaker/upgrade06.xsl Mar 19 13:55:33
> >>>>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info:
> >>>>>> update_validation: Transformation /usr/share/pacemaker/upgrade06.xsl
> >>>>>> successful
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> notice: update_validation: Upgraded from transitional-0.6 to
> >>>>>> pacemaker-1.0 validation
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> WARN: cli_config_update: Your configuration was internally updated
> >>>>>> to the latest version (pacemaker-1.0)
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_pe_invoke: Query 54: Requesting the current CIB: S_POLICY_ENGINE
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_pe_invoke: Query 55: Requesting the current CIB: S_POLICY_ENGINE
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_pe_invoke_callback: Invoking the PE:
> >>>>>> ref=pe_calc-dc-1269021333-29, seq=2, quorate=1
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0,
> >>>>>> 'green' = 0 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine:
> >>>>>> [4714]: info: determine_online_status: Node
> >>>>>> dbsuat1a.intranet.mydomain.com is shutting down
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> WARN: unpack_rsc_op: Processing failed op
> >>>>>> IPaddr_172_28_185_49_stop_0 on dbsuat1a.intranet.mydomain.com:
> >>>>>> unknown exec error (-2)
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: native_add_running: resource IPaddr_172_28_185_49 isnt managed
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: determine_online_status: Node dbsuat1b.intranet.mydomain.com
> >>>>>> is online Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine:
> >>>>>> [4714]: notice: native_print: IPaddr_172_28_185_49   
> >>>>>> (ocf::heartbeat:IPaddr): Started dbsuat1a.intranet.mydomain.com
> >>>>>> (unmanaged) FAILED
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: get_failcount: IPaddr_172_28_185_49 has failed 1000000 times
> >>>>>> on dbsuat1a.intranet.mydomain.com
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> WARN: common_apply_stickiness: Forcing IPaddr_172_28_185_49 away
> >>>>>> from dbsuat1a.intranet.mydomain.com after 1000000 failures
> >>>>>> (max=1000000) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com
> >>>>>> pengine: [4714]: info: native_color: Unmanaged resource
> >>>>>> IPaddr_172_28_185_49 allocated to 'nowhere': failed
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for
> >>>>>> shutdown Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine:
> >>>>>> [4714]: notice: LogActions: Leave resource IPaddr_172_28_185_49   
> >>>>>> (Started unmanaged) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com
> >>>>>> crmd: [4531]: info: handle_response: pe_calc calculation
> >>>>>> pe_calc-dc-1269021333-28 is obsolete Mar 19 13:55:33
> >>>>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info:
> >>>>>> process_pe_message: Transition 6: PEngine Input stored in:
> >>>>>> /usr/var/lib/pengine/pe-input-338.bz2
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: process_pe_message: Configuration WARNINGs found during PE
> >>>>>> processing. Please run "crm_verify -L" to identify issues.
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> notice: update_validation: Upgrading transitional-0.6-style
> >>>>>> configuration to pacemaker-1.0 with
> >>>>>> /usr/share/pacemaker/upgrade06.xsl Mar 19 13:55:33
> >>>>>> DBSUAT1A.intranet.mydomain.com pengine: [4714]: info:
> >>>>>> update_validation: Transformation /usr/share/pacemaker/upgrade06.xsl
> >>>>>> successful
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> notice: update_validation: Upgraded from transitional-0.6 to
> >>>>>> pacemaker-1.0 validation
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> WARN: cli_config_update: Your configuration was internally updated
> >>>>>> to the latest version (pacemaker-1.0)
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: unpack_config: Node scores: 'red' = -INFINITY, 'yellow' = 0,
> >>>>>> 'green' = 0 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine:
> >>>>>> [4714]: info: determine_online_status: Node
> >>>>>> dbsuat1a.intranet.mydomain.com is shutting down
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> WARN: unpack_rsc_op: Processing failed op
> >>>>>> IPaddr_172_28_185_49_stop_0 on dbsuat1a.intranet.mydomain.com:
> >>>>>> unknown exec error (-2)
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: native_add_running: resource IPaddr_172_28_185_49 isnt managed
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: determine_online_status: Node dbsuat1b.intranet.mydomain.com
> >>>>>> is online Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine:
> >>>>>> [4714]: notice: native_print: IPaddr_172_28_185_49   
> >>>>>> (ocf::heartbeat:IPaddr): Started dbsuat1a.intranet.mydomain.com
> >>>>>> (unmanaged) FAILED
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: get_failcount: IPaddr_172_28_185_49 has failed 1000000 times
> >>>>>> on dbsuat1a.intranet.mydomain.com
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> WARN: common_apply_stickiness: Forcing IPaddr_172_28_185_49 away
> >>>>>> from dbsuat1a.intranet.mydomain.com after 1000000 failures
> >>>>>> (max=1000000) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com
> >>>>>> pengine: [4714]: info: native_color: Unmanaged resource
> >>>>>> IPaddr_172_28_185_49 allocated to 'nowhere': failed
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: stage6: Scheduling Node dbsuat1a.intranet.mydomain.com for
> >>>>>> shutdown Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine:
> >>>>>> [4714]: notice: LogActions: Leave resource IPaddr_172_28_185_49   
> >>>>>> (Started unmanaged) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com
> >>>>>> pengine: [4714]: info: process_pe_message: Transition 7: PEngine
> >>>>>> Input stored in:
> >>>>>> /usr/var/lib/pengine/pe-input-339.bz2
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_state_transition: State transition S_POLICY_ENGINE ->
> >>>>>> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
> >>>>>> origin=handle_response ]
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: process_pe_message: Configuration WARNINGs found during PE
> >>>>>> processing. Please run "crm_verify -L" to identify issues.
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> unpack_graph: Unpacked transition 7: 1 actions in 1 synapses
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_te_invoke: Processing graph 7 (ref=pe_calc-dc-1269021333-29)
> >>>>>> derived from /usr/var/lib/pengine/pe-input-339.bz2
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> te_crm_command: Executing crm-event (10): do_shutdown on
> >>>>>> dbsuat1a.intranet.mydomain.com
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> te_crm_command: crm-event (10) is a local shutdown
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> run_graph: ====================================================
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice:
> >>>>>> run_graph: Transition 7 (Complete=1, Pending=0, Fired=0, Skipped=0,
> >>>>>> Incomplete=0, Source=/usr/var/lib/pengine/pe-input-339.bz2):
> >>>>>> Complete Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd:
> >>>>>> [4531]: info: te_graph_trigger: Transition 7 is now complete
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_state_transition: State transition S_TRANSITION_ENGINE ->
> >>>>>> S_STOPPING [ input=I_STOP cause=C_FSA_INTERNAL origin=notify_crmd ]
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_dc_release: DC role released
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> stop_subsystem: Sent -TERM to pengine: [4714]
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_te_control: Transitioner is now inactive
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com pengine: [4714]:
> >>>>>> info: crm_signal_dispatch: Invoking handler for signal 15:
> >>>>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd:
> >>>>>> [4531]: info: do_te_control: Disconnecting STONITH...
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> tengine_stonith_connection_destroy: Fencing daemon disconnected
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: notice:
> >>>>>> Not currently connected.
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_shutdown: Terminating the pengine
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> stop_subsystem: Sent -TERM to pengine: [4714]
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_shutdown: Waiting for subsystems to exit
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN:
> >>>>>> register_fsa_input_adv: do_shutdown stalled the FSA with pending
> >>>>>> inputs Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]:
> >>>>>> info: do_shutdown: All subsystems stopped, continuing
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: WARN:
> >>>>>> do_log: FSA: Input I_RELEASE_SUCCESS from do_dc_release() received
> >>>>>> in state S_STOPPING
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_shutdown: Terminating the pengine
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> stop_subsystem: Sent -TERM to pengine: [4714]
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_shutdown: Waiting for subsystems to exit
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_shutdown: All subsystems stopped, continuing
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> crmdManagedChildDied: Process pengine:[4714] exited (signal=0,
> >>>>>> exitcode=0) Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd:
> >>>>>> [4531]: info: pe_msg_dispatch: Received HUP from pengine:[4714]
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> pe_connection_destroy: Connection to the Policy Engine released
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_shutdown: All subsystems stopped, continuing
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: ERROR:
> >>>>>> verify_stopped: Resource IPaddr_172_28_185_49 was active at
> >>>>>> shutdown. You may ignore this error if it is unmanaged.
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_lrm_control: Disconnected from the LRM
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info:
> >>>>>> client (pid=4531) removed from ccm
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_ha_control: Disconnected from Heartbeat
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_cib_control: Disconnecting CIB
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info:
> >>>>>> cib_process_readwrite: We are now in R/O mode
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> crmd_cib_connection_destroy: Connection to the CIB terminated...
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_exit: Performing A_EXIT_0 - gracefully exiting the CRMd
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> free_mem: Dropping I_TERMINATE: [ state=S_STOPPING
> >>>>>> cause=C_FSA_INTERNAL origin=do_stop ]
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com crmd: [4531]: info:
> >>>>>> do_exit: [crmd] stopped (0)
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: killing /usr/lib64/heartbeat/attrd process group 4530 with
> >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd:
> >>>>>> [4530]: info: crm_signal_dispatch: Invoking handler for signal 15:
> >>>>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd:
> >>>>>> [4530]: info: attrd_shutdown: Exiting
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info:
> >>>>>> main: Exiting...
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com attrd: [4530]: info:
> >>>>>> attrd_cib_connection_destroy: Connection to the CIB terminated...
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: killing /usr/lib64/heartbeat/stonithd process group 4529 with
> >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com stonithd:
> >>>>>> [4529]: notice: /usr/lib64/heartbeat/stonithd normally quit.
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: killing /usr/lib64/heartbeat/lrmd -r process group 4528 with
> >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com lrmd:
> >>>>>> [4528]: info: lrmd is shutting down
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com lrmd: [4528]: WARN:
> >>>>>> resource IPaddr_172_28_185_49 is left in UNKNOWN status.(last op
> >>>>>> stop finished without LRM_OP_DONE status.)
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: killing /usr/lib64/heartbeat/cib process group 4527 with
> >>>>>> signal 15 Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib:
> >>>>>> [4527]: info: crm_signal_dispatch: Invoking handler for signal 15:
> >>>>>> Terminated Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib:
> >>>>>> [4527]: info:
> >>>>>> cib_shutdown: Disconnected 0 clients
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info:
> >>>>>> cib_process_disconnect: All clients disconnected...
> >>>>>> Mar 19 13:55:33 DBSUAT1A.intranet.mydomain.com cib: [4527]: info:
> >>>>>> initiate_exit: Sending disconnect notification to 2 peers...
> >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info:
> >>>>>> cib_process_shutdown_req: Shutdown ACK from
> >>>>>> dbsuat1b.intranet.mydomain.com Mar 19 13:55:34
> >>>>>> DBSUAT1A.intranet.mydomain.com cib: [4527]: info: terminate_cib:
> >>>>>> cib_process_shutdown_req: Disconnecting heartbeat Mar 19 13:55:34
> >>>>>> DBSUAT1A.intranet.mydomain.com cib: [4527]: info: terminate_cib:
> >>>>>> Exiting...
> >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info:
> >>>>>> cib_process_request: Operation complete: op cib_shutdown_req for
> >>>>>> section 'all'
> >>>>>> (origin=dbsuat1b.intranet.mydomain.com/dbsuat1b.intranet.mydomain.co
> >>>>>>m/ (n ull ), version=0.0.0): ok (rc=0)
> >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info:
> >>>>>> ha_msg_dispatch: Lost connection to heartbeat service.
> >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com cib: [4527]: info:
> >>>>>> main: Done
> >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com ccm: [4526]: info:
> >>>>>> client (pid=4527) removed from ccm
> >>>>>> Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: killing /usr/lib64/heartbeat/ccm process group 4526 with
> >>>>>> signal 15 Mar 19 13:55:34 DBSUAT1A.intranet.mydomain.com ccm:
> >>>>>> [4526]: info: received SIGTERM, going to shut down
> >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: killing HBFIFO process 4522 with signal 15
> >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: killing HBWRITE process 4523 with signal 15
> >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: killing HBREAD process 4524 with signal 15
> >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: Core process 4524 exited. 3 remaining
> >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: Core process 4523 exited. 2 remaining
> >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: Core process 4522 exited. 1 remaining
> >>>>>> Mar 19 13:55:35 DBSUAT1A.intranet.mydomain.com heartbeat: [4519]:
> >>>>>> info: dbsuat1a.intranet.mydomain.com Heartbeat shutdown complete.
> >>>>>>
> >>>>>>
> >>>>>> my ha.cf
> >>>>>> # Logging
> >>>>>> debug                          1
> >>>>>> debugfile             /var/log/ha-debug
> >>>>>> logfile             /var/log/ha-log
> >>>>>> logfacility             local0
> >>>>>> #use_logd                       true
> >>>>>> #logfacility                    daemon
> >>>>>>
> >>>>>> # Misc Options
> >>>>>> traditional_compression        off
> >>>>>> compression                    bz2
> >>>>>> coredumps                      true
> >>>>>>
> >>>>>> # Communications
> >>>>>> udpport                        691
> >>>>>> bcast                          eth0
> >>>>>> ##autojoin                     any
> >>>>>> autojoin                       none
> >>>>>>
> >>>>>> # Thresholds (in seconds)
> >>>>>> keepalive                      1
> >>>>>> warntime                       6
> >>>>>> deadtime                       10
> >>>>>> initdead                       15
> >>>>>>
> >>>>>> node dbsuat1a.intranet.mydomain.com
> >>>>>> node dbsuat1b.intranet.mydomain.com
> >>>>>> #enable pacemaker
> >>>>>> crm yes
> >>>>>> #enable STONITH
> >>>>>> #crm respawn
> >>>>>>
> >>>>>> my haresources:
> >>>>>> DBSUAT1A.intranet.mydomain.com 172.28.185.49
> >>>>>
> >>>>> I don't have a very good advice, but you shouldn't use haresources
> >>>>> anymore. You should use pacemaker for configuring the cluster.
> >>>>>
> >>>>> You have said that you wish to use pacemaker(crm) with this line of
> >>>>> your config: crm yes
> >>>>>
> >>>>> Remove the haresources file, restart the heartbeat on both nodes and
> >>>>> redo the tests.
> >>>>>
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>>-- -
> >>>>>
> >>>>> _______________________________________________
> >>>>> Linux-HA mailing list
> >>>>> [email protected]
> >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>> See also: http://linux-ha.org/ReportingProblems
> >>>>
> >>>> _______________________________________________
> >>>> Linux-HA mailing list
> >>>> [email protected]
> >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>> See also: http://linux-ha.org/ReportingProblems
> >>>
> >>> -----------------------------------------------------------------------
> >>>-
> >>>
> >>> _______________________________________________
> >>> Linux-HA mailing list
> >>> [email protected]
> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> See also: http://linux-ha.org/ReportingProblems
> >>
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 

-- 
Best regards,
Marian Marinov

signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Simple 2 node cluster wont release ip

Reply via email to