I could have this completely wrong but:

The current behaviour of ipfail with:
 a simple active/passive v1 setup,  (latest production version of heartbeat)
with only single network heartbeat path using same NIC as normal
network traffic.
ping nodes configured..

1)All resources are running on the master, slave is connected and healthy.
2)Unplug the slave network card
3)Slave loses contact with master, and ping node, assumes failover and goes live
- Sends grat ARPs for resources (which fail because NIC is unplugged)
4) unplug master NIC
5) Plug slaves NIC back in
6) Slave can see ping node again and assumes all is A OK (which it is kind off)
BUT The switch fabric still thinks that master owns the resource,
because it hasn't seen the grat ARP.

BEST solution: redundent heartbeat media...

But assuming that is not possible....

Could ipfail not resend the grat ARP  IF the network ping node comes
back after a partition?
i.e. it knows the network has just come back up so tell everyone about
resources...
I can't see any downside to that?

But then I'm probably not looking at it in the correct way AND our app
doesn't care if both nodes go live (split brain).
Also haven't tried pingd, but assuming similar behaviour.

Ps. Heartbeat is awsome, we've never had a production problem with
over 800 load balancers deployed.
If you need any kind of sponsorship / hardware donation let me know off list.




--
Regards,

Malcolm Turnbull.

Loadbalancer.org Ltd.
Phone: +44 (0)870 443 8779
http://www.loadbalancer.org/
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to