On 7/20/07, Andrew Grimberg <[EMAIL PROTECTED]> wrote: > Greetings folks, > > Perhaps it's been answered and I'm just not finding it, but here goes. > I've got two clusters using pingd. One is a firewall cluster, the other > a router cluster. Each cluster is monitoring the floating address of > the other cluster using pingd for connectivity. Either I've got my > values set way to tight on the monitoring I shouldn't be using the > floats for the actually connectivity monitoring.
yeah, not generally a good idea. if you happen to perform a monitor action at the time the IP is being moved then you're in trouble. > > Currently my pingd is set for dampen 5s, multiplier 100; monitor set for > 1s interval, 2s timeout. This is the same on both clusters, just > different pingd host lists of course. Possibly a little aggressive... ping can take a surprising amount of time to complete in some circumstances > > The problem I'm running into and what's prompting this email. Yesterday > one of our router nodes tanked hard (kernel panic) and during the > transition over to the second node our FW cluster apparently decided > that it needed to fail over as well, but it only partially failed. This is why STONITH was invented :-) > That > is, the addresses came up on both nodes causing all sorts of routing > headaches and dropping the network behind the FW cluster offline from > outside network access. Which really seems to indicate a partial split > brain for what ever reason, I'm working on this issue on the side and > will be switching from network only heartbeat to null modem shortly. > > Have I implemented this all wrong or is it just that my timing is too > tight? > > My thanks, > -Andy- > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
