[Linux-HA] Failcount goes on increasing

Taldevkar, Chetan Wed, 18 Jul 2007 09:12:13 -0700

Hi all,

I have heartbeat 2.0.8 with following configurations.


Node1 score = 101500
Node2 score = 101000
default_resource_stickiness = -200
default_resource_failure_stickiness = -351

It should failover after retries

(101500-101000 - (-200)) = 700

700/abs(-351) = 1.99 times 

Initial failcount for node 1 and node 2 are zero.

1. Node 1 Failsover to Node2 with failcount 2
2. Node 2 failsover to node1 with failcount 2
3. Node 1 failsover to node2 with failcount 5
4. Node 2 failsover to node1 with failcount 5
5. Node 1 failsover to node2 with failcount 9
6. Node 2 failsover to node1 with failcount 7

Is this expected behavior? Why in iteration 3 failcount is increased to
5 instead of 4 i.e. (2+2). Then check out iteration 5 it is shot up to 4
times more. 

How do I make retry count constant i.e. 2 ?

Am I correct in choosing 2.0.8 version or should I choose other version?


Regards,
Chetan


http://www.patni.com
World-Wide Partnerships. World-Class Solutions.
_____________________________________________________________________

This e-mail message may contain proprietary, confidential or legally
privileged information for the sole use of the person or entity to
whom this message was originally addressed. Any review, e-transmission
dissemination or other use of or taking of any action in reliance upon
this information by persons or entities other than the intended
recipient is prohibited. If you have received this e-mail in error
kindly delete  this e-mail from your records. If it appears that this
mail has been forwarded to you without proper authority, please notify
us immediately at [EMAIL PROTECTED] and delete this mail. 
_____________________________________________________________________

<?xml version="1.0" ?>
<cib>
    <configuration>
        <crm_config>
           <cluster_property_set id="cib-bootstrap-options">
             <attributes>
                 <nvpair id="symmetric_cluster" name="symmetric_cluster" 
value="true"/>
                 <nvpair id="no_quorum_policy" name="no_quorum_policy" 
value="stop"/>
                 <nvpair id="default_resource_stickiness" 
name="default_resource_stickiness" value="-200"/>
                 <nvpair id="default_resource_failure_stickiness" 
name="default_resource_failure_stickiness" value="-351"/>
                 <nvpair id="stonith_enabled" name="stonith_enabled" 
value="false"/>
                 <nvpair id="stop_orphan_resources" 
name="stop_orphan_resources" value="true"/>
                 <nvpair id="stop_orphan_actions" name="stop_orphan_actions" 
value="true"/>
                 <nvpair id="remove_after_stop" name="remove_after_stop" 
value="true"/>
                 <nvpair id="is_managed_default" name="is_managed_default" 
value="true"/>
                 <nvpair id="short_resource_names" name="short_resource_names" 
value="true"/>
             </attributes>
           </cluster_property_set>
        </crm_config>
        <nodes/>
        <resources>
                <primitive id="res_ttsvc" class="heartbeat" type="ttmgr.sh" 
provider="heartbeat">
                  <instance_attributes id="res_ttsvc_instance_attrs">
                    <attributes/>
                  </instance_attributes>
                  <operations>
                        <op id="tt_start_1" name="start" description="begin op" 
timeout="2s" start_delay="0" disabled="false" on_fail="restart"/>
                        <op id="tt_status_1" name="monitor" description="check 
state" interval="2s" timeout="3s" start_delay="0" disabled="false" 
on_fail="restart"/>
                        <op id="tt_stop_1" name="stop" description="stop status 
check" timeout="2s" start_delay="0" disabled="false" on_fail="restart"/>
                  </operations>
               </primitive>
        </resources>
        <constraints>
            <rsc_location id="place_testconfig" rsc="res_ttsvc">
                <rule id="prefered_testconfig" score="101500">
                    <expression id="e1" attribute="#uname" operation="eq" 
value="wabtectestconfig.patni.com"/>
                </rule>
            </rsc_location>
            <rsc_location id="place_wl1config" rsc="res_ttsvc">
                <rule id="prefered_wl1config" score="101000">
                    <expression id="e2" attribute="#uname" operation="eq" 
value="wabtecwl1.patni.com"/>
                </rule>
            </rsc_location>
        </constraints>
    </configuration>
    <status/>
</cib>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Failcount goes on increasing

Reply via email to