On Tue, 2013-01-08 at 09:18 +1100, Andrew Beekhof wrote:

> > On Fri, 2012-12-28 at 14:54 -0700, Greg Woods wrote:
> >
> >> The problem is that either node can come up and run all the resources,
> >> but as soon as I bring the other node online, it briefly looks normal,
> >> but as soon as the stonith resource starts, the currently running node
> >> gets fenced and the new node takes over all the resources. Then the
> >> fenced node comes up, fences the other node and takes over, etc. Death
> >> match.

> Thats odd. Normally its a firewall issue.  Did you happen to choose a
> different port perhaps?

Close, but not quite. I did finally figure out what was going on, as the
death match started again as I was reconfiguring the cluster from
scratch, but this time I knew more about what was causing it. It started
as soon as I added "xend" as a resource. A little trial and error showed
that the heartbeat does not work if it is on an interface that also has
a Xen bridge attached to it. This is unexpected because all the other
kinds of networking on that interface work fine with the bridge active
(e.g. ssh connections, IPMI connections, etc.), only heartbeat is
affected. But it was absolutely reproducible. If I started xend by hand
instead of having it as a cluster resource, again I got a death match. A
careful reading of the logs did show that heartbeat was declaring the
other node dead. So for some reason, heartbeat communication was lost as
soon as the bridge was activated. I got the cluster running with xend by
moving the heartbeat to a different interface. This is less than ideal
because that interface is attached to a network that is also used for
different things and has other hosts attached to it, but since this is
only a test cluster, that's acceptable.

--Greg


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to