Greetings folks, Perhaps it's been answered and I'm just not finding it, but here goes. I've got two clusters using pingd. One is a firewall cluster, the other a router cluster. Each cluster is monitoring the floating address of the other cluster using pingd for connectivity. Either I've got my values set way to tight on the monitoring I shouldn't be using the floats for the actually connectivity monitoring.
Currently my pingd is set for dampen 5s, multiplier 100; monitor set for 1s interval, 2s timeout. This is the same on both clusters, just different pingd host lists of course. The problem I'm running into and what's prompting this email. Yesterday one of our router nodes tanked hard (kernel panic) and during the transition over to the second node our FW cluster apparently decided that it needed to fail over as well, but it only partially failed. That is, the addresses came up on both nodes causing all sorts of routing headaches and dropping the network behind the FW cluster offline from outside network access. Which really seems to indicate a partial split brain for what ever reason, I'm working on this issue on the side and will be switching from network only heartbeat to null modem shortly. Have I implemented this all wrong or is it just that my timing is too tight? My thanks, -Andy-
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
