Hi,

I have a primary / backup v2.0.8 setup monitoring OpenSer and 2 IP addresses.

If I make a mistake in a config file for a resource that is being controlled by Linux-HA (OpenSer) and for whatever reason the resource dies and a restart is attempted, the restart will fail and the resource will migrate to the backup node as expected. However once I fix the problem so the resource could start again on the primary, I can never get Linux-HA to migrate the resource back.

I don't think this has anything to do with scoring because when I don't break my config files and manually kill the service 13 times on box01 (the reason for 13 is in my included cib.xml) the resources migrates from box01 to box02 as expected. Setting the fail count back below 13 causes the service to migrate back, also as expected.

However, trying to fail back to a system that previously had broken OpenSer config files that have now been fixed, I can't get them to come back no matter how low I set the fail count. Is there another variable or INFINITY constraint somewhere that gets set when a resource fails to start that makes the resources stay away? What can I do when I want Linux-HA to re-try migration of the service back to a recently hand fixed primary?

Additionally, I followed the advice under "Resetting Failure Counts" in the V2 FAQ ( http://linux-ha.org/v2/faq ) where it suggests:

crm_failcount -D -U nodeA -r my_rsc

Rather than reset the failure count, this just torches it in such a way that you can't even read it with the query command given in the next step of the same example. I found statically setting the count back to 0 with:

crm_failcount -v 0 -U box01 -r OpenSer

worked much better and allowed me to push resources back and forth just by moving the fail count up and down.

Thanks.

-Anders








<cib admin_epoch="1" have_quorum="true" num_peers="1" cib_feature_revision="1.3" ignore_dtd="false" ccm_transition="3" generated="true" dc_uuid="9052abe5-87ee-4400-a008-c5f13205e94b" epoch="15" num_updates="606" cib-last-written="Mon Nov 12 22:37:10 2007">
  <configuration>
    <crm_config>
      <cluster_property_set id="cluster-property-set">
        <attributes>
<nvpair id="short_resource_names" name="short_resource_names" value="true"/> <nvpair id="pe-input-series-max" name="pe-input-series-max" value="-1"/> <nvpair id="default-resource-stickiness" name="default-resource-stickiness" value="10"/> <nvpair id="default-resource-failure-stickiness" name="default-resource-failure-stickiness" value="-10"/>
        </attributes>
      </cluster_property_set>
    </crm_config>
    <nodes>
<node id="9052abe5-87ee-4400-a008-c5f13205e94b" uname="box01" type="normal"/> <node id="47658455-4da2-48d4-a8da-419b2f93f039" uname="box02" type="normal"/>
    </nodes>
    <resources>
      <group id="IPaddr2_OpenSer_group">
<primitive id="IPaddr2-10.1.53.235" class="ocf" type="IPaddr2" provider="heartbeat">
          <operations>
<op id="ipaddr2-10.1.53.235-monitor" name="monitor" interval="5s" timeout="3s"/>
          </operations>
          <instance_attributes id="IPaddr2-10.1.53.235-attributes">
            <attributes>
<nvpair id="ipaddr2-10.1.53.235-ip" name="ip" value="10.1.53.235"/> <nvpair id="ipaddr2-10.1.53.235-broadcast" name="broadcast" value="10.1.53.255"/> <nvpair id="ipaddr2-10.1.53.235-cidr_netmask" name="cidr_netmask" value="24"/>
            </attributes>
          </instance_attributes>
        </primitive>
<primitive id="IPaddr2-10.1.53.236" class="ocf" type="IPaddr2" provider="heartbeat">
          <operations>
<op id="ipaddr2-10.1.53.236-monitor" name="monitor" interval="5s" timeout="3s"/>
          </operations>
          <instance_attributes id="IPaddr2-10.1.53.236-attributes">
            <attributes>
<nvpair id="ipaddr2-10.1.53.236-ip" name="ip" value="10.1.53.236"/> <nvpair id="ipaddr2-10.1.53.236-broadcast" name="broadcast" value="10.1.53.255"/> <nvpair id="ipaddr2-10.1.53.236-cidr_netmask" name="cidr_netmask" value="24"/>
            </attributes>
          </instance_attributes>
        </primitive>
<primitive id="OpenSer" class="ocf" type="OpenSer" provider="bandwidth.com">
          <operations>
            <op id="openser-start" name="start" timeout="5s"/>
            <op id="openser-stop" name="stop" timeout="3s"/>
<op id="openser-monitor" name="monitor" interval="10s" timeout="3s">
              <instance_attributes id="monitor_10s">
                <attributes>
<nvpair id="openser-monitor-ip" name="ip" value="127.0.0.1"/>
                </attributes>
              </instance_attributes>
            </op>
          </operations>
        </primitive>
      </group>
    </resources>
    <constraints>
      <rsc_location id="OpenSer_resource_location" rsc="OpenSer">
        <rule id="rule_box01" score="100">
<expression id="expression_uname_eq_box01" attribute="#uname" operation="eq" value="box01"/>
        </rule>
        <rule id="rule_box02" score="10">
<expression id="expression_uname_eq_box02" attribute="#uname" operation="eq" value="box02"/>
        </rule>
      </rsc_location>
    </constraints>
  </configuration>
</cib>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to