Hi,
I'm running a two node cluster where all the resources have to run on the
same node and failed resources must not trigger anything.
I'm having trouble configuring the following behavior:
* A lsb resource "foo" is monitored every 10 seconds and the cluster must
try to restart it on bad status return code ;
* If the restart-on-failure fails, I don't want to do anything more yet,
just keep on going with a failed resource.
All my resources being linked by collocation and order, right now a failing
restart on my resource moves everything to the other node.
My test case is to put "exit 4" in the foo initscript in the start section
and issue 'kill -KILL $(pidof foo)'.
I tried the following configuration:
primitive bind lsb:foo \
meta target-role="Started" \
op monitor on-fail="restart" interval="10s" \
op start on-fail="ignore" interval="0"
and
primitive bind lsb:foo \
meta target-role="Started" \
op monitor on-fail="restart" interval="10s" OCF_CHECK_LEVEL="10" \
op monitor on-fail="ignore" interval="60s" OCF_CHECK_LEVEL="20"
I believe the first configuration I tried doesn't work because the "op
start" is only used on /real/ start of the service, not a restart issued by
the "op monitor" and, I don't really understand the second configuration but
it doesn't work either.
The "restart-on-failure" part is really easy and works, alone. But I just
can't find a way to ignore a failing restart.
Any help appreciated.
Florian
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems