On Thu, May 20, 2010 at 3:30 PM, mike <[email protected]> wrote:
> Gianluca Cecchi wrote:
>> On Thu, May 20, 2010 at 2:45 PM, mike <[email protected]> wrote:
>>
>>
>>> ok, I actually went ahead and did a test on my cluster. The results did
>>> not occur as I would have expected.
>>>
>>> I failed ldirectord twice on the main node. I waited 20 minutes and saw
>>> this entry in the log file:
>>> May 20 08:23:10 lvsuat1a.intranet.mydomain.com pengine: [6589]: notice:
>>> get_failcount: Failcount for ldirectord on
>>> lvsuat1a.intranet.mydomain.com has expired (limit was 900s)
>>>
>>> So now I kill ldirectord again, fully expecting it to restart on the
>>> same node but instead a failover occurs:
>>> May 20 08:36:15 lvsuat1a.intranet.mydomain.com pengine: [6589]: WARN:
>>> common_apply_stickiness: Forcing ldirectord away from
>>> lvsuat1a.intranet.mydomain.com after 3 failures (max=3)
>>>
>>>
>>>
>> So your version of pacemaker should be a 1.0.x one.
>> In fact Andrew wrote that the reset is not automatic for that version, while
>> it should be for upcoming 1.1
>>
>> Gianluca
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>>
>>
>
> Yes, he said that In 1.0 it becomes ignored after the specified
> interval. I wasn't sure what he meant by that. I thought perhaps he
> meant it would ignore future failures and not fail over.

No, sorry. In 1.0 you have to clear out the fail-counts manually.
Yes, its not ideal.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to