Hi,

I have a two node cluster running heartbeat 2.0.7 and I've experienced 
unintentional fail-overs twice.  I think I just don't get the algorithm for how 
it decides to fail over.  I must be being dense.  So I'm wondering...

Given a healthy, primary node that is currently active with heartbeat (it has 
the IP address), what could cause it to *initiate* a fail over to the secondary 
node?  Sure the secondary may want to take over if my heartbeat timer settings 
are off, if it doesn't receive the heartbeat in time, can't see the primary, 
etc.  But those are all secondary node initiated events.  So, provided my 
primary feels healthy (it can see it's upstream gateway for example), what 
might make it initiate the transfer of power?

And maybe the answer lies in the secondary and I'm just not understanding.  Is it 
possible for the secondary to send some type of message to the primary that essentially 
means "back off, I'm taking it from here?"

Juicy details of our setup are here 
(http://marc.info/?l=linux-ha&m=122520665324046&w=2).  But I think these are 
the fundamental points that I just can't get my head around.

Thanks so much for any ideas!

-Rick

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to