Gianluca Cecchi wrote:
> On Thu, May 20, 2010 at 2:45 PM, mike <[email protected]> wrote:
>
>   
>> ok, I actually went ahead and did a test on my cluster. The results did
>> not occur as I would have expected.
>>
>> I failed ldirectord twice on the main node. I waited 20 minutes and saw
>> this entry in the log file:
>> May 20 08:23:10 lvsuat1a.intranet.mydomain.com pengine: [6589]: notice:
>> get_failcount: Failcount for ldirectord on
>> lvsuat1a.intranet.mydomain.com has expired (limit was 900s)
>>
>> So now I kill ldirectord again, fully expecting it to restart on the
>> same node but instead a failover occurs:
>> May 20 08:36:15 lvsuat1a.intranet.mydomain.com pengine: [6589]: WARN:
>> common_apply_stickiness: Forcing ldirectord away from
>> lvsuat1a.intranet.mydomain.com after 3 failures (max=3)
>>
>>
>>     
> So your version of pacemaker should be a 1.0.x one.
> In fact Andrew wrote that the reset is not automatic for that version, while
> it should be for upcoming 1.1
>
> Gianluca
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
>   

Yes, he said that In 1.0 it becomes ignored after the specified 
interval. I wasn't sure what he meant by that. I thought perhaps he 
meant it would ignore future failures and not fail over.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to