[Linux-HA] Two node cluster monitoring configuration to ignore failing on restart

Florian Crouzat Thu, 29 Sep 2011 02:49:05 -0700

Hi,

I'm running a two node cluster where all the resources have to run on the
same node and failed resources must not trigger anything.


I'm having trouble configuring the following behavior:
 * A lsb resource "foo" is monitored every 10 seconds and the cluster must
try to restart it on bad status return code ;
 * If the restart-on-failure fails, I don't want to do anything more yet,
just keep on going with a failed resource.

All my resources being linked by collocation and order, right now a failing
restart on my resource moves everything to the other node.

My test case is to put "exit 4" in the foo initscript in the start section
and issue 'kill -KILL $(pidof foo)'.

I tried the following configuration:

primitive bind lsb:foo \
        meta target-role="Started" \
        op monitor on-fail="restart" interval="10s" \
        op start on-fail="ignore" interval="0"

and

primitive bind lsb:foo \
        meta target-role="Started" \
          op monitor on-fail="restart" interval="10s" OCF_CHECK_LEVEL="10" \
        op monitor on-fail="ignore" interval="60s" OCF_CHECK_LEVEL="20"

I believe the first configuration I tried doesn't work because the "op
start" is only used on /real/ start of the service, not a restart issued by
the "op monitor" and, I don't really understand the second configuration but
it doesn't work either.

The "restart-on-failure" part is really easy and works, alone. But I just
can't find a way to ignore a failing restart.

Any help appreciated.

Florian





_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Two node cluster monitoring configuration to ignore failing on restart

Reply via email to