[Linux-HA] Heartbeat 2's detection of resource failures

Simon Kirby Mon, 23 Feb 2009 13:17:11 -0800

In Heartbeat version 1, IP addresses and other resources that failed to
bind or start on heartbeat startup would be blindly ignored and heartbeat
would just try to continue.


In Heartbeat version 2, the same case causes Heartbeat to unwind and stop
all service, leaving all nodes down and leaving it that way without
manual intervention.

While it probably is a good idea to check the return code of the resource
start actions, I don't think it's useful if there is no other logic to
handle this case.  Maybe it should still behave as Heartbeat 1 did.

The next best thing would probably be to try failing back and if all
nodes are exhausted, continue on one while ignoring return codes.  This
start getting more complicated, however...

It would be nice to support an "|| true"-type feature for resources that
"don't really matter"; eg., backup partitions which need to fail over but
don't necessarily need to be HA.

Thoughts?

Simon-
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] Heartbeat 2's detection of resource failures

Reply via email to