Hi,
I have a primary / backup v2.0.8 setup monitoring OpenSer and 2 IP
addresses.
If I make a mistake in a config file for a resource that is being
controlled by Linux-HA (OpenSer) and for whatever reason the resource
dies and a restart is attempted, the restart will fail and the resource
will migrate to the backup node as expected. However once I fix the
problem so the resource could start again on the primary, I can never
get Linux-HA to migrate the resource back.
I don't think this has anything to do with scoring because when I don't
break my config files and manually kill the service 13 times on box01
(the reason for 13 is in my included cib.xml) the resources migrates
from box01 to box02 as expected. Setting the fail count back below 13
causes the service to migrate back, also as expected.
However, trying to fail back to a system that previously had broken
OpenSer config files that have now been fixed, I can't get them to come
back no matter how low I set the fail count. Is there another variable
or INFINITY constraint somewhere that gets set when a resource fails to
start that makes the resources stay away? What can I do when I want
Linux-HA to re-try migration of the service back to a recently hand
fixed primary?
Additionally, I followed the advice under "Resetting Failure Counts" in
the V2 FAQ ( http://linux-ha.org/v2/faq ) where it suggests:
crm_failcount -D -U nodeA -r my_rsc
Rather than reset the failure count, this just torches it in such a way
that you can't even read it with the query command given in the next
step of the same example. I found statically setting the count back to 0
with:
crm_failcount -v 0 -U box01 -r OpenSer
worked much better and allowed me to push resources back and forth just
by moving the fail count up and down.
Thanks.
-Anders
<cib admin_epoch="1" have_quorum="true" num_peers="1"
cib_feature_revision="1.3" ignore_dtd="false" ccm_transition="3"
generated="true" dc_uuid="9052abe5-87ee-4400-a008-c5f13205e94b"
epoch="15" num_updates="606" cib-last-written="Mon Nov 12 22:37:10 2007">
<configuration>
<crm_config>
<cluster_property_set id="cluster-property-set">
<attributes>
<nvpair id="short_resource_names" name="short_resource_names"
value="true"/>
<nvpair id="pe-input-series-max" name="pe-input-series-max"
value="-1"/>
<nvpair id="default-resource-stickiness"
name="default-resource-stickiness" value="10"/>
<nvpair id="default-resource-failure-stickiness"
name="default-resource-failure-stickiness" value="-10"/>
</attributes>
</cluster_property_set>
</crm_config>
<nodes>
<node id="9052abe5-87ee-4400-a008-c5f13205e94b" uname="box01"
type="normal"/>
<node id="47658455-4da2-48d4-a8da-419b2f93f039" uname="box02"
type="normal"/>
</nodes>
<resources>
<group id="IPaddr2_OpenSer_group">
<primitive id="IPaddr2-10.1.53.235" class="ocf" type="IPaddr2"
provider="heartbeat">
<operations>
<op id="ipaddr2-10.1.53.235-monitor" name="monitor"
interval="5s" timeout="3s"/>
</operations>
<instance_attributes id="IPaddr2-10.1.53.235-attributes">
<attributes>
<nvpair id="ipaddr2-10.1.53.235-ip" name="ip"
value="10.1.53.235"/>
<nvpair id="ipaddr2-10.1.53.235-broadcast"
name="broadcast" value="10.1.53.255"/>
<nvpair id="ipaddr2-10.1.53.235-cidr_netmask"
name="cidr_netmask" value="24"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="IPaddr2-10.1.53.236" class="ocf" type="IPaddr2"
provider="heartbeat">
<operations>
<op id="ipaddr2-10.1.53.236-monitor" name="monitor"
interval="5s" timeout="3s"/>
</operations>
<instance_attributes id="IPaddr2-10.1.53.236-attributes">
<attributes>
<nvpair id="ipaddr2-10.1.53.236-ip" name="ip"
value="10.1.53.236"/>
<nvpair id="ipaddr2-10.1.53.236-broadcast"
name="broadcast" value="10.1.53.255"/>
<nvpair id="ipaddr2-10.1.53.236-cidr_netmask"
name="cidr_netmask" value="24"/>
</attributes>
</instance_attributes>
</primitive>
<primitive id="OpenSer" class="ocf" type="OpenSer"
provider="bandwidth.com">
<operations>
<op id="openser-start" name="start" timeout="5s"/>
<op id="openser-stop" name="stop" timeout="3s"/>
<op id="openser-monitor" name="monitor" interval="10s"
timeout="3s">
<instance_attributes id="monitor_10s">
<attributes>
<nvpair id="openser-monitor-ip" name="ip"
value="127.0.0.1"/>
</attributes>
</instance_attributes>
</op>
</operations>
</primitive>
</group>
</resources>
<constraints>
<rsc_location id="OpenSer_resource_location" rsc="OpenSer">
<rule id="rule_box01" score="100">
<expression id="expression_uname_eq_box01" attribute="#uname"
operation="eq" value="box01"/>
</rule>
<rule id="rule_box02" score="10">
<expression id="expression_uname_eq_box02" attribute="#uname"
operation="eq" value="box02"/>
</rule>
</rsc_location>
</constraints>
</configuration>
</cib>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems