On 2012-10-17T09:18:06, Michael Schwartzkopff <[email protected]> wrote:
> If you have errors in the network you eventually loose packets.
> corosync/paceamker doesn't like this and sometimes reacts on heavy packet
> loss.
It's not really pacemaker that is affected, but corosync's totem
protocol implementation. That should be discussed on the corosync
mailing list; robustness improvements are always welcome.
It would be helpful if we added a "fail-fast" mode that checked the
node's local health - starting with free disk space (which many parts of
the stack have a problem with too), scheduling latency, etc, and then
stopped tickling the watchdog.
Since dealing with byzantine failure modes is such a pain where the sun
doesn't shine, transforming them into "clean" node failures is a good
idea.
There are some daemons which do some of this (sbd included), but none of
them integrates them all, and neither do we make that a "mandatory
recommendation".
Regards,
Lars
--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems