Re: [Linux-HA] Problems with quorum, no-quorum-policy and NMI messages

Lars Marowsky-Bree Wed, 17 Oct 2012 01:17:28 -0700

On 2012-10-17T09:18:06, Michael Schwartzkopff <[email protected]> wrote:


> If you have errors in the network you eventually loose packets. 
> corosync/paceamker doesn't like this and sometimes reacts on heavy packet 
> loss.

It's not really pacemaker that is affected, but corosync's totem
protocol implementation. That should be discussed on the corosync
mailing list; robustness improvements are always welcome.

It would be helpful if we added a "fail-fast" mode that checked the
node's local health - starting with free disk space (which many parts of
the stack have a problem with too), scheduling latency, etc, and then
stopped tickling the watchdog.

Since dealing with byzantine failure modes is such a pain where the sun
doesn't shine, transforming them into "clean" node failures is a good
idea.

There are some daemons which do some of this (sbd included), but none of
them integrates them all, and neither do we make that a "mandatory
recommendation".



Regards,
    Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Problems with quorum, no-quorum-policy and NMI messages

Reply via email to