So to see if I understand correctly a couple scenarios:

Assume a failure-timeout of 15 minutes
1. lets assume I have 2 failures within 5 minutes and then no failure 
for 20 minutes afterwards. After that 20 minutes I have a failure. Are 
you saying no failover will occur at that point and that the failcount 
will NOT be reset?
2. If I understand point #1 correctly, what if I have 2 failures again 
within 5 minutes and then 20 minutes later I have 3 successive failures 
within 10 minutes. Will the  resource failover  or will it continue to 
ignore the failcounts? I guess what I'm really asking here is, is the 15 
minute failure-timeout a rolling thing that gets reset or is it a one 
shot deal, i.e. once ignored the first time always ignored from that 
point on?

Thank you Andrew

Mike
Andrew Beekhof wrote:
> On Wed, May 19, 2010 at 5:22 PM, mike <mgbut...@nbnet.nb.ca> wrote:
>   
>> Andrew Beekhof wrote:
>>     
>>>> which is what my DBA was looking for. He wants mysql to failover if
>>>> there are 3 successive failures of MySQL but only if those successive
>>>> failures occur within 15 minutes.
>>>>
>>>>         
>>> You want migration-threshold=3 and failure-timeout=900000 (15 * 60 * 1000
>>>
>>>       
>> Thanks Andrew,
>> I placed the failure-timeout=900000 piece in my resource section like so:
>>
>> <primitive class="ocf" id="ldirectord" provider="heartbeat"
>> type="ldirectord">
>>          <instance_attributes id="ldirectord-instance_attributes">
>>            <nvpair id="ldirectord-instance_attributes-configfile"
>> name="configfile" value="/usr/etc/ha.d/ldirectord.cf"/>
>>            <nvpair id="ldirectord-options-migration-threshold"
>> name="migration-threshold" value="3"/>
>>            <nvpair id="ldirectord-options-failure-timeout"
>> name="failure-timeout" value="900s"/>
>>          </instance_attributes>
>>          <operations>
>>            <op id="ldirectord-monitor-2m" interval="2m" name="monitor"
>> timeout="20s"/>
>>            <op id="ldirectord-start-0" interval="0" name="start"
>> timeout="90s"/>
>>            <op id="ldirectord-stop-0" interval="0" name="stop"
>> timeout="100s"/>
>>          </operations>
>> </primitive>
>>
>>
>> crm_mon shows this initially:
>> Migration summary:
>> * Node lvsuat1a.intranet.aeroplan.com:
>>   ldirectord: migration-threshold=3 fail-count=2 last-failure=' ~P'
>>
>>
>> but it never changes. The failcount never resets. Am I missing something?
>>     
>
> No, it won't reset in 1.0 thats something new in 1.1
> In 1.0 it becomes ignored after the specified interval.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>   

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to