On Wed, 2010-05-05 at 13:29 +0200, Dejan Muhamedagic wrote:

> If these servers have a lights-out device and the power
> distribution is fairly reliable, that could be an alternative for
> fencing.

They do have an IPMI device and it does work. I am trying to insulate
against a failure of the NIC or cable by having a second stonith device.

The cluster I have now is primarily for testing, but eventually we will
be implementing critical services (e.g. DNS, e-mail, DHCP, and
authentication) in virtual machines running on a cluster like this one,
so part of the testing process is to learn what can and can't be done
and where the potential gotchas are. I have discovered that if I
simulate a cable failure by removing it, bad things happen because
stonith cannot succeed. I would not want my DNS system to be vulnerable
to a single cable failing, so I am looking for ways to guard against it.

A complete power outage on one of the nodes also results in bad things
when using IPMI. Again stonith cannot succeed and so the remaining
server will not take over the resources. Yes, these are dual power
supply servers so it is unlikely that something would happen that causes
only one of the servers to completely lose power other than human error
(possibly a motherboard failure as well?) but I am still looking to
determine if there is a way to guard against this. Right now I have a
"meatware" stonith device set up so that I can at least log in remotely
and manually force the remaining server to take over, but I am looking
for something more automatic. It would be nice to avoid those 3AM phone
calls )-:

I may take a shot at modifying the external/rackpdu stonith plugin at
some point. We can't be the only ones in the world using dual power
supply servers. I'll probably start by unplugging one of the power
supplies on each server and making sure I understand how to use the
plugin in single-outlet mode, then try doing the modifications to
support dual outlets.

--Greg


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to