Re: [Linux-HA] stonith external/rackpdu question

Dejan Muhamedagic Thu, 06 May 2010 01:56:21 -0700

Hi,

On Wed, May 05, 2010 at 09:54:00AM -0600, Greg Woods wrote:
> On Wed, 2010-05-05 at 16:21 +0200, Dejan Muhamedagic wrote:
> 
> > rackpdu also works over the network. The lights-out device has
> > its own network interface. Presumably both should be connected to
> > some management network. Where's the difference?
> 
> The difference is in redundancy. Both radkpdu and ipmilan use a network,
> but it's not the SAME network. So using both doesn't avoid all points of
> failure, but it avoids a NIC or cable being a *single* point of failure.


It's not a single point of failure, because something else must
fail in the first place. Clusters are not supposed to protect
from more than one failure happening within a short period of time.

> >  it is unlikely that something would happen that causes
> > > only one of the servers to completely lose power other than human error
> > > (possibly a motherboard failure as well?)
> > 
> > That's an interesting question. Perhaps the server vendor can
> > tell.
> 
> I think there will always be hardware failure modes that would cause the
> server to be non-functional and the ipmilan stonith to fail. I do
> understand that these situations would be very rare. It isn't like this
> is a show-stopper problem. But I am still looking to see if it can be
> insulated against.
> 
> > 
> > You can have more than one stonith resource and they'll be tried
> > in a round-robin fashion until one succeeds.
> 
> Yes. This is what I have now with the ipmilan first and meatware second.
> I've tested this. Brute force killing of heartbeat does result in an
> ipmilan stonith. Powering down one of the servers causes the resources
> running on that server to be in a non-running state, because then the
> ipmilan stonith will fail, but running the meatclient program can force
> the remaining server to take over once I verify that the power to the
> other server is really really gone. I would manually shut off outlets on
> the PDU to ensure it, but at least all of that can be done remotely
> using the server that is still up. Better still, of course, would be to
> have the remaining server shut off the outlets and take over resources
> automatically, which is what I'm aiming for here.
> 
> > The plugin picks the outlet in one of the two ways:
> > 
> > - from the device itself, provided that the outlet name matches
> >   the node name
> > - from the external file which specifies the mapping
> 
> I expect that allowing multiple PDU/outlet combinations to be specified
> in the config file is the only way to do this. 

Well, nothing wrong with having more than one outlet have the
same name.

> > I can help with modifying the plugin if you'll do the testing.
> 
> I'd be more than happy to do some testing. That's what the test cluster
> is for.

OK.

Thanks,

Dejan

> --Greg
> 
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] stonith external/rackpdu question

Reply via email to