So to see if I understand correctly a couple scenarios: Assume a failure-timeout of 15 minutes 1. lets assume I have 2 failures within 5 minutes and then no failure for 20 minutes afterwards. After that 20 minutes I have a failure. Are you saying no failover will occur at that point and that the failcount will NOT be reset? 2. If I understand point #1 correctly, what if I have 2 failures again within 5 minutes and then 20 minutes later I have 3 successive failures within 10 minutes. Will the resource failover or will it continue to ignore the failcounts? I guess what I'm really asking here is, is the 15 minute failure-timeout a rolling thing that gets reset or is it a one shot deal, i.e. once ignored the first time always ignored from that point on?
Thank you Andrew Mike Andrew Beekhof wrote: > On Wed, May 19, 2010 at 5:22 PM, mike <mgbut...@nbnet.nb.ca> wrote: > >> Andrew Beekhof wrote: >> >>>> which is what my DBA was looking for. He wants mysql to failover if >>>> there are 3 successive failures of MySQL but only if those successive >>>> failures occur within 15 minutes. >>>> >>>> >>> You want migration-threshold=3 and failure-timeout=900000 (15 * 60 * 1000 >>> >>> >> Thanks Andrew, >> I placed the failure-timeout=900000 piece in my resource section like so: >> >> <primitive class="ocf" id="ldirectord" provider="heartbeat" >> type="ldirectord"> >> <instance_attributes id="ldirectord-instance_attributes"> >> <nvpair id="ldirectord-instance_attributes-configfile" >> name="configfile" value="/usr/etc/ha.d/ldirectord.cf"/> >> <nvpair id="ldirectord-options-migration-threshold" >> name="migration-threshold" value="3"/> >> <nvpair id="ldirectord-options-failure-timeout" >> name="failure-timeout" value="900s"/> >> </instance_attributes> >> <operations> >> <op id="ldirectord-monitor-2m" interval="2m" name="monitor" >> timeout="20s"/> >> <op id="ldirectord-start-0" interval="0" name="start" >> timeout="90s"/> >> <op id="ldirectord-stop-0" interval="0" name="stop" >> timeout="100s"/> >> </operations> >> </primitive> >> >> >> crm_mon shows this initially: >> Migration summary: >> * Node lvsuat1a.intranet.aeroplan.com: >> ldirectord: migration-threshold=3 fail-count=2 last-failure=' ~P' >> >> >> but it never changes. The failcount never resets. Am I missing something? >> > > No, it won't reset in 1.0 thats something new in 1.1 > In 1.0 it becomes ignored after the specified interval. > _______________________________________________ > Linux-HA mailing list > Linux-HA@lists.linux-ha.org > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems