Hi,

On Fri, Jan 11, 2008 at 08:09:26PM +0100, Andreas Mock wrote:
> Hi Dejan, hi Lukas,
> 
> this post is (probably) important to you.
> 
> > -----Urspr?ngliche Nachricht-----
> > Von: [EMAIL PROTECTED] 
> > [mailto:[EMAIL PROTECTED] Im Auftrag von 
> > Dejan Muhamedagic
> > Gesendet: Freitag, 11. Januar 2008 18:02
> > An: General Linux-HA mailing list
> > Betreff: Re: [Linux-HA] Problem with configuration of STONITH device
> > 
> > 
> > Hi,
> > 
> > On Fri, Jan 11, 2008 at 04:45:24PM +0100, Luk?? Pecha wrote:
> > > Hello,
> > > does anyone have any experience with external/ibmrsa 
> > stonith device agent?
> 
> Yes, me.  :-)
> 
> 
> > >
> > > I have two servers and both of them have RSA II module, so 
> > I want to enable 
> > > STONITH on them. I have read the related topics at 
> > linux-ha.org, but it's 
> > > still not clear to me, how it works - when I create the 
> > stonith clone 
> > > device, do I have to create a primitive resource for each 
> > server or do I 
> > > have to somehow specify all of the nodes in one primitive resource?
> > 
> > Whichever you prefer. It is admitedly a bit confusing. If you
> > have only two nodes, then it may be preferable not to use clones,
> > but just define two stonith resources and make constraints which
> > wouldn't let them run on the node matching the hostname in the
> > resource.
> 
> Dejan,
> 
> probably you can remember. We had a discussion about that, because it was
> said that the stonithd on one node would prevent to call a stonith
> agent to kill its own node. Is this still true?

Yes.

> You wanted to have a look at this.
> Background was:
> 1) Normal setup:
> * Stonith-Primitive 1 to kill node 2
> * Stonith-Primitive 2 to kill node 1
> * Constraint for Primitive 1 to run on node 1 preferably
> * Constraint for Primitive 2 to run on node 2 preferably
> (assumption: when node 1 gets crazy stonith on node 2 can shoot it)
> 2) Failure with monitoring of stonith primitive 1 on node 1:
> * Primitive 1 will be moved to node2, yes this should not happen and
> an administrator has to investigate, but meanwhile Primitive 1
> running NOW on node 2 should be possible to shoot itself.
> 
> Dejan, can you remember?

Yes. One serious problem in this case is that the cluster can
never know if the stonith operation was successful. Which would
basically render the cluster unusable.

> > It is an extra package provided by IBM. A tad heavy though: it's
> > a Java application.
> 
> Yes, it slurps CPU cycles like a Bavarian...?h sorry..
> Bohemian slurps beer.  ;-)
> The worse: I got errors while monitoring regularly. Someone
> posted here that this is related to timing problems.

I can recall that it worked for me(tm). However, it is very
demanding in terms of memory/cpu and that's definitely not good
for stonith.

> > There's a relatively new alternative stonith
> > agent ibmrsa-telnet. You may want to try that one. It's available
> > with the 2.1.3 release.
> 
> The first release of my script packaged with 2.1.3 is buggy.
> Sorry for that! I found that very quickly but the updated and posted 
> version of my new script seems to be never arrived at the maintainers.

Hmm, I probably picked the wrong version then. Buggy in a sense
that it won't work at all? Can you describe the bug.

> So, Dejan, could you please check in the differences?
> I have the (more) correct version attached.

Thanks, I'll update the repository.

> The second version plays very well for some months now in a productive
> environment.
> 
> 
> > 
> > > Please, if you have experience with this, could you please 
> > provide me with 
> > > some info and an example of the cib configuration for 
> > ibmrsa stonith agent?
> > 
> > At the bottom of the said ibmrsa-telnet you'll find a CIB snippet
> > defining one stonith resource.
> 
> IMPORTANT: 
> 1) The external stonith api allowes that an external stonith plugin
> can be responsible of shooting more that one node. Parameter while calling.
> My external stonith plugin shoots exactly the one node configured via
> CIB. It ignores the parameter. Probably I should add a check for that.

What exactly do you refer to? All parameters are defined by the
plugin itself. If that is the case, why should you ignore any of
them :)

> 2) The RSA board allows only one telnet session at a time. So if someone logs
> in to e.g. check something and at the same time a monitor cycle is started by
> HA, the resource gets a monitor failure and will probably moved.

Yes, that's one typical problem with this class of devices. But
there's nothing one can do about that.

> Enhancements are wellcome.
> 
> Dejan, is there a proper place to put these hints somewhere for others?
> In the file, man page, wiki?

wiki.linux-ha.org

Probably here: http://wiki.linux-ha.org/StonithAgents

There is also the idioms page where you could add your CIB example:

http://wiki.linux-ha.org/CIB/Idioms/

Cheers,

Dejan


> 
> Best regards
> Andreas Mock
> 
> 


> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to