On Mon, Nov 22, 2010 at 11:13 PM, Andrew Miklas <[email protected]> wrote:
> Hi,
>
> On 20-Nov-10, at 12:47 AM, Andrew Beekhof wrote:
>
>> What do you think you gain by not increasing the timeout?
>> We don't sit around doing nothing if it completes in only a fraction
>> of the allocated time.
>>
>
> It's possible I should increase it.  The problem is that I'm not aware
> of an upper bound on the time it takes an elastic IP on AWS to be
> reassigned from one node to another, so I wasn't really sure how to
> set the value.  Mostly, though, I want to ensure that the IPs are
> always bound to some host (i.e. the IP resources should never be
> stopped or unmanaged, except for a lack of running nodes).  Since I
> had to run the monitor check once every 60-90s anyway, I figured that
> would take care of any failed previous runs.
>
>
>> In any case, if you really want, check out the start-failure-is-fatal
>> option (man pengine).
>
> I couldn't track down this man page (it is apparently missing from the
> Debian packages).  The guide "Pacemaker 1.0: Configuration Explained"
> says that "when set to false, the cluster will instead use the
> resource's failcount and value for resource-failure-stickiness".
>
> Am I correct in thinking that if this value is set to true and a
> resource fails to start, it is kicked into unmanaged mode?

No, it just excludes the host it failed on from attempting to host the resource.
True is the default, false gives the behavior you were asking for

>> Did you set cluster-recheck-interval appropriately? (man crmd)
>
> Nope -- that did it, thanks.
>
>
> Another question -- is it possible to define resources that do not
> have stop actions?  On AWS, there is no need to explicitly stop an
> elastic IP before reassigning it to another node (the IP will be
> automatically released from a host before it is assigned to another).

By who?

> Doing so unnecessarily slows down the IP flip operation.

In theory you could make it a no-op, but yes, it must be defined.

>
> I've played around a bit with making the stop action in the resource
> script be a no-op that always returns OCF_SUCCESS.  This seems to
> work, but I can imagine that there will be situations where Pacemaker
> may get confused (a call to "monitor" will show a resource is still
> running, even though an immediately prior call to "stop" returned
> success).

Yep, that's the downside of a no-op.
Does it really take so long to remove the IP?

>
>
> Thanks,
>
>
> Andrew
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to