Re: [Linux-HA] STONITH Module for Dell DRAC5

Alan Robertson Mon, 14 May 2007 08:15:31 -0700

Andrew Beekhof wrote:
> 
> On May 14, 2007, at 2:36 PM, Alan Robertson wrote:
> 
>> Dejan Muhamedagic wrote:
>>> On Fri, May 11, 2007 at 10:04:11AM +0200, Th.Paschy, hepasoft oHG wrote:
>>>> Hi all,
>>>>
>>>> I am a new user of heartbeat.
>>>>
>>>> I configure an active/passive cluster with too Dell PE1900 based on
>>>> SuSE
>>>> linux with heartbeat 2.0.8 (r1-style). After some problems by DRBD
>>>> resources
>>>> after a cold reset of the master node (not removed locks), which was
>>>> fixed
>>>> by Phillipp Reissner last weekend all works fine.
>>>>
>>>> At next I was looking for a stonith module for the Dell Remote Access
>>>> Controller DRAC 5 but I find only one for the drac3. Inside the
>>>> drac5 the
>>>> layout of the embedded Web-Interface have been changed, so the drac3
>>>> module
>>>> won't work.
>>>>
>>>> So I've write my own module strongly based on the acpmaster module. The
>>>> module uses the SM-CLP command line interface of the drac5 via
>>>> telnet. I'm
>>>> really not a good C-programmer but it works perfectly.
>>>>
>>>> But there would be one problem (and with the drac3 module it would
>>>> be too),
>>>> if the server lost power connection. So the Remote Access Card won't be
>>>> accessible and the fencing process will never been stopped and so no
>>>> resource take_over take place, unless you manually take corrective
>>>> action.
>>>> So a redundant power supply would be strongly recommend.
>>>>
>>>> I've seen that other users are looking for a drac5 module too, so I've
>>>> attached the source of the drac5 module.
>>>
>>> Thanks for the contribution! Alan will probably want to the usual
>>> legal chanting.
>>
>> I'll send you an email on this.
>>
>>>> It would be glad, if some one could tell me a way, how I can handle the
>>>> described problem on never endings fencing, if the access to the
>>>> drac will
>>>> loss (cause of power lost or the network connection would fail).
>>>
>>> Unfortunately, there's no workaround. If heartbeat cannot stonith
>>> the node, it will go on trying forever. If stonith is configured,
>>> we must make sure that the node is rebooted or shutdown. If the
>>> stonith device is not accessible, well, too bad. The UPS based
>>> stonith devices are definitely preferable to the lights-out
>>> embedded kind.
>>
>> Sometime in the past, I asked Andrew for a feature which would allow the
>> takeover to proceed after a certain number of failed STONITHs, if things
>> were configured to allow that.  I don't remember whether he did that
>> or not.
> 
> It never got implemented.
> 
>>
>> For these kind of cases, it seems like a good thing.
> 
> I disagree - remember the "you can't make it up" part of "You dont know
> what you dont know"
> In general, you don't know that the node is dead, only that the stonith
> device is...
> 
> In the case of this plugin, apparently, the stonith device being dead
> implies the host is also.


Or a network failure, or other things.  It's not a certain thing.

Although one wishes not to make things up, and avoids it when one can,
for some configurations it's better than sitting on one's hands.  And,
for split-site configurations, it would mean leaving out STONITH
completely, when one should _try_ and use it.  In the split-site case,
all stonith plugins would be unusable  if the inter-site link fails.

> This makes me inclined to think that the plugin is therefor a "better"
> place to implement such behavior.
> 
> It also means that such a feature could be turned on for individual
> stonith devices rather than unilaterally - which may not be a good idea
> especially in mixed-stonith-device environments

I can see the logic to this.  But, of course, it is more work to
implement it 20 times in 20 plugins than in one place.  And probably
then 20 slightly different criteria for detecting it and 4 or 5
different ways to specify it in the configuration.

And, of course, since most of the plugin authors don't work on the
project, and many no longer have access to the necessary hardware, it's
probably impossible to implement in practice in the plugins.

I suppose it could be implemented in the stonith daemon instead.
Unfortunately, it has no access to configuration information, so it
would be painful to specify to the stonithd compared to the CRM or the
plugins.

If stonithd were a resource, then configuring it would be easy.
[although having the stonith daemon be a resource would create its own
problems].

-- 
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] STONITH Module for Dell DRAC5

Reply via email to