Re: [Linux-HA] heartbeat step down after split brain scenario

2011-06-20 Thread Jack Berg


Hi - thanks for the response.


Dimitri Maziuk wrote:
 
 
 
 What do you mean by disconnecting: what's your failure scenario and
 how do you expect it to be handled?
 

The disconnection is the loss of the intersite link which interrupts
heartbeat comms.

In this case it's expected that both sites will acquire the resources and
become active.

However, what I want to happen is that one of the sites will give up the
resources again when it sees that the other site is up again.


Dimitri Maziuk wrote:
 
 
 Running daemons are not guaranteed (arguably, expected) to notice when
 the network cable is unplugged. You have to monitor the link and restart
 all processes that bind()/listen() on the interface.
 
 If your nodes are at different sites, you need to also deal with the
 loss of link at the switch, gateway, etc., and figure out which one is
 still connected to the Internet -- and gets to keep the VIP. Which in
 general can't be done from the nodes themselves.
 

Yes - in this case neither site has to be connected to the internet, this is
more an internal load balancing act between two connected sites in a
customers network.

What I found is that by setting auto_failback on in ha.cf at both sites
the site/node listed in haresources will keep the resources when the link is
re-established and the other site will release the resources. 

This is the result I was looking for.

Regards
Jack
-- 
View this message in context: 
http://old.nabble.com/heartbeat-step-down-after-split-brain-scenario-tp31858728p31884521.html
Sent from the Linux-HA mailing list archive at Nabble.com.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] heartbeat step down after split brain scenario

2011-06-16 Thread Jack Berg

I have a two node cluster using heartbeat and haproxy. Unfortunately it is
impossible to provide redundant heartbeat paths between the two nodes at
different sites so it is possible for a failure to cause split brain.

To evaluate the impact I tried disconnecting the two nodes and I found that
both become active and both try to keep the VIPs after the link is restored.

Is this avoidable using the auto_failback option?


-- 
View this message in context: 
http://old.nabble.com/heartbeat-step-down-after-split-brain-scenario-tp31858728p31858728.html
Sent from the Linux-HA mailing list archive at Nabble.com.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat step down after split brain scenario

2011-06-16 Thread Dimitri Maziuk
On 06/16/2011 04:28 AM, Jack Berg wrote:
 
 I have a two node cluster using heartbeat and haproxy. Unfortunately it is
 impossible to provide redundant heartbeat paths between the two nodes at
 different sites so it is possible for a failure to cause split brain.
 
 To evaluate the impact I tried disconnecting the two nodes and I found that
 both become active and both try to keep the VIPs after the link is restored.

What do you mean by disconnecting: what's your failure scenario and
how do you expect it to be handled?

Running daemons are not guaranteed (arguably, expected) to notice when
the network cable is unplugged. You have to monitor the link and restart
all processes that bind()/listen() on the interface.

If your nodes are at different sites, you need to also deal with the
loss of link at the switch, gateway, etc., and figure out which one is
still connected to the Internet -- and gets to keep the VIP. Which in
general can't be done from the nodes themselves.

Dima
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems