On 7/20/07, Andrew Grimberg <[EMAIL PROTECTED]> wrote:
> Greetings folks,
>
> Perhaps it's been answered and I'm just not finding it, but here goes.
> I've got two clusters using pingd.  One is a firewall cluster, the other
> a router cluster.  Each cluster is monitoring the floating address of
> the other cluster using pingd for connectivity.  Either I've got my
> values set way to tight on the monitoring I shouldn't be using the
> floats for the actually connectivity monitoring.

yeah, not generally a good idea.
if you happen to perform a monitor action at the time the IP is being
moved then you're in trouble.

>
> Currently my pingd is set for dampen 5s, multiplier 100; monitor set for
> 1s interval, 2s timeout.  This is the same on both clusters, just
> different pingd host lists of course.

Possibly a little aggressive... ping can take a surprising amount of
time to complete in some circumstances

>
> The problem I'm running into and what's prompting this email.  Yesterday
> one of our router nodes tanked hard (kernel panic) and during the
> transition over to the second node our FW cluster apparently decided
> that it needed to fail over as well, but it only partially failed.

This is why STONITH was invented :-)

>  That
> is, the addresses came up on both nodes causing all sorts of routing
> headaches and dropping the network behind the FW cluster offline from
> outside network access.  Which really seems to indicate a partial split
> brain for what ever reason, I'm working on this issue on the side and
> will be switching from network only heartbeat to null modem shortly.
>
> Have I implemented this all wrong or is it just that my timing is too
> tight?
>
> My thanks,
> -Andy-
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to