Re: [Linux-HA] STONITH Module for Dell DRAC5

Andrew Beekhof Mon, 14 May 2007 06:07:59 -0700


On May 14, 2007, at 2:36 PM, Alan Robertson wrote:

Dejan Muhamedagic wrote:
On Fri, May 11, 2007 at 10:04:11AM +0200, Th.Paschy, hepasoft oHGwrote:
Hi all,

I am a new user of heartbeat.
I configure an active/passive cluster with too Dell PE1900 basedon SuSElinux with heartbeat 2.0.8 (r1-style). After some problems byDRBD resourcesafter a cold reset of the master node (not removed locks), whichwas fixed
by Phillipp Reissner last weekend all works fine.
At next I was looking for a stonith module for the Dell RemoteAccessController DRAC 5 but I find only one for the drac3. Inside thedrac5 thelayout of the embedded Web-Interface have been changed, so thedrac3 module
won't work.
So I've write my own module strongly based on the acpmastermodule. Themodule uses the SM-CLP command line interface of the drac5 viatelnet. I'm
really not a good C-programmer but it works perfectly.
But there would be one problem (and with the drac3 module itwould be too),if the server lost power connection. So the Remote Access Cardwon't be
accessible and the fencing process will never been stopped and so no
resource take_over take place, unless you manually takecorrective action.
So a redundant power supply would be strongly recommend.
I've seen that other users are looking for a drac5 module too, soI've
attached the source of the drac5 module.
Thanks for the contribution! Alan will probably want to the usual
legal chanting.
I'll send you an email on this.
It would be glad, if some one could tell me a way, how I canhandle thedescribed problem on never endings fencing, if the access to thedrac will
loss (cause of power lost or the network connection would fail).
Unfortunately, there's no workaround. If heartbeat cannot stonith
the node, it will go on trying forever. If stonith is configured,
we must make sure that the node is rebooted or shutdown. If the
stonith device is not accessible, well, too bad. The UPS based
stonith devices are definitely preferable to the lights-out
embedded kind.
Sometime in the past, I asked Andrew for a feature which wouldallow thetakeover to proceed after a certain number of failed STONITHs, ifthingswere configured to allow that. I don't remember whether he didthat or not.


It never got implemented.


For these kind of cases, it seems like a good thing.

I disagree - remember the "you can't make it up" part of "You dontknow what you dont know"In general, you don't know that the node is dead, only that thestonith device is...

In the case of this plugin, apparently, the stonith device being deadimplies the host is also.This makes me inclined to think that the plugin is therefor a"better" place to implement such behavior.

It also means that such a feature could be turned on for individualstonith devices rather than unilaterally - which may not be a goodidea especially in mixed-stonith-device environments



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] STONITH Module for Dell DRAC5

Reply via email to