Hi, On Wed, May 05, 2010 at 09:54:00AM -0600, Greg Woods wrote: > On Wed, 2010-05-05 at 16:21 +0200, Dejan Muhamedagic wrote: > > > rackpdu also works over the network. The lights-out device has > > its own network interface. Presumably both should be connected to > > some management network. Where's the difference? > > The difference is in redundancy. Both radkpdu and ipmilan use a network, > but it's not the SAME network. So using both doesn't avoid all points of > failure, but it avoids a NIC or cable being a *single* point of failure.
It's not a single point of failure, because something else must fail in the first place. Clusters are not supposed to protect from more than one failure happening within a short period of time. > > it is unlikely that something would happen that causes > > > only one of the servers to completely lose power other than human error > > > (possibly a motherboard failure as well?) > > > > That's an interesting question. Perhaps the server vendor can > > tell. > > I think there will always be hardware failure modes that would cause the > server to be non-functional and the ipmilan stonith to fail. I do > understand that these situations would be very rare. It isn't like this > is a show-stopper problem. But I am still looking to see if it can be > insulated against. > > > > > You can have more than one stonith resource and they'll be tried > > in a round-robin fashion until one succeeds. > > Yes. This is what I have now with the ipmilan first and meatware second. > I've tested this. Brute force killing of heartbeat does result in an > ipmilan stonith. Powering down one of the servers causes the resources > running on that server to be in a non-running state, because then the > ipmilan stonith will fail, but running the meatclient program can force > the remaining server to take over once I verify that the power to the > other server is really really gone. I would manually shut off outlets on > the PDU to ensure it, but at least all of that can be done remotely > using the server that is still up. Better still, of course, would be to > have the remaining server shut off the outlets and take over resources > automatically, which is what I'm aiming for here. > > > The plugin picks the outlet in one of the two ways: > > > > - from the device itself, provided that the outlet name matches > > the node name > > - from the external file which specifies the mapping > > I expect that allowing multiple PDU/outlet combinations to be specified > in the config file is the only way to do this. Well, nothing wrong with having more than one outlet have the same name. > > I can help with modifying the plugin if you'll do the testing. > > I'd be more than happy to do some testing. That's what the test cluster > is for. OK. Thanks, Dejan > --Greg > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
