On Wed, 2010-05-05 at 13:29 +0200, Dejan Muhamedagic wrote: > If these servers have a lights-out device and the power > distribution is fairly reliable, that could be an alternative for > fencing.
They do have an IPMI device and it does work. I am trying to insulate against a failure of the NIC or cable by having a second stonith device. The cluster I have now is primarily for testing, but eventually we will be implementing critical services (e.g. DNS, e-mail, DHCP, and authentication) in virtual machines running on a cluster like this one, so part of the testing process is to learn what can and can't be done and where the potential gotchas are. I have discovered that if I simulate a cable failure by removing it, bad things happen because stonith cannot succeed. I would not want my DNS system to be vulnerable to a single cable failing, so I am looking for ways to guard against it. A complete power outage on one of the nodes also results in bad things when using IPMI. Again stonith cannot succeed and so the remaining server will not take over the resources. Yes, these are dual power supply servers so it is unlikely that something would happen that causes only one of the servers to completely lose power other than human error (possibly a motherboard failure as well?) but I am still looking to determine if there is a way to guard against this. Right now I have a "meatware" stonith device set up so that I can at least log in remotely and manually force the remaining server to take over, but I am looking for something more automatic. It would be nice to avoid those 3AM phone calls )-: I may take a shot at modifying the external/rackpdu stonith plugin at some point. We can't be the only ones in the world using dual power supply servers. I'll probably start by unplugging one of the power supplies on each server and making sure I understand how to use the plugin in single-outlet mode, then try doing the modifications to support dual outlets. --Greg _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
