Greetings folks,

Perhaps it's been answered and I'm just not finding it, but here goes.
I've got two clusters using pingd.  One is a firewall cluster, the other
a router cluster.  Each cluster is monitoring the floating address of
the other cluster using pingd for connectivity.  Either I've got my
values set way to tight on the monitoring I shouldn't be using the
floats for the actually connectivity monitoring.

Currently my pingd is set for dampen 5s, multiplier 100; monitor set for
1s interval, 2s timeout.  This is the same on both clusters, just
different pingd host lists of course.

The problem I'm running into and what's prompting this email.  Yesterday
one of our router nodes tanked hard (kernel panic) and during the
transition over to the second node our FW cluster apparently decided
that it needed to fail over as well, but it only partially failed.  That
is, the addresses came up on both nodes causing all sorts of routing
headaches and dropping the network behind the FW cluster offline from
outside network access.  Which really seems to indicate a partial split
brain for what ever reason, I'm working on this issue on the side and
will be switching from network only heartbeat to null modem shortly.

Have I implemented this all wrong or is it just that my timing is too
tight?

My thanks,
-Andy-

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to