Hi, I have a two node cluster running heartbeat 2.0.7 and I've experienced unintentional fail-overs twice. I think I just don't get the algorithm for how it decides to fail over. I must be being dense. So I'm wondering...
Given a healthy, primary node that is currently active with heartbeat (it has the IP address), what could cause it to *initiate* a fail over to the secondary node? Sure the secondary may want to take over if my heartbeat timer settings are off, if it doesn't receive the heartbeat in time, can't see the primary, etc. But those are all secondary node initiated events. So, provided my primary feels healthy (it can see it's upstream gateway for example), what might make it initiate the transfer of power? And maybe the answer lies in the secondary and I'm just not understanding. Is it possible for the secondary to send some type of message to the primary that essentially means "back off, I'm taking it from here?" Juicy details of our setup are here (http://marc.info/?l=linux-ha&m=122520665324046&w=2). But I think these are the fundamental points that I just can't get my head around. Thanks so much for any ideas! -Rick _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
