Alex,

Thanks for the info.

Would you by chance mind posting your mon script?


-Josh




On Wed, 2009-12-09 at 21:22 -0500, Alex Dean wrote:

> On Dec 9, 2009, at 5:34 PM, Mullis, Josh (CCI - Atlanta) wrote:
> 
> > Shouldn't node1 release the resource if the ping node (1.1.1.1) is  
> > down?
> 
> That's not how ipfail works.
> 
> ipfail presumes that the two nodes are always in contact.  Based on  
> their ability to ping 1.1.1.1, they will decide which one should hold  
> your resources.  If the two nodes lose contact with each other, you  
> have a split-brain and all bets are off.
> 
> "Note that ipfail needs redundant communications media to work  
> correctly - because it won't cause a failover on its own unless it can  
> contact the other cluster member. In other words, if you're pinging on  
> the same media as the only heartbeat channel configured, you're  
> destined to be disappointed in ipfail."
> http://linux-ha.org/ipfail
> 
> If your ethernet connection is your only medium your cluster nodes can  
> use to communicate, ipfail really isn't much use.  You could try  
> adding something like mon.  I've written a mon alert which causes  
> heartbeat to go standby if it can't ping it's gateway IP, and this has  
> worked pretty well.  Mon's really quite easy to learn, and I think it  
> only took an afternoon of tinkering to get a 'go standby' action I was  
> happy with.
> 
> http://linux-ha.org/mon
> http://mon.wiki.kernel.org/index.php/Main_Page
> 
> You could also switch to a v2 heartbeat+pacemaker configuration, which  
> will get you resource-level monitoring.  In this case, the ability to  
> ping 1.1.1.1 is your 'resource'.  I believe you'd then use pingd  
> rather than ipfail.  I haven't done this personally, but I'm sure many/ 
> most on this list have.
> 
> alex
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to