Hi,

On Tue, Jan 15, 2008 at 11:30:41AM +0100, Andreas Mock wrote:
> > -----Urspr?ngliche Nachricht-----
> > Von: General Linux-HA mailing list <[email protected]>
> > Gesendet: 15.01.08 10:23:03
> > An: General Linux-HA mailing list <[email protected]>
> > Betreff: Re: [Linux-HA] URGENT: Problem with configuration of STONITH device
> > > probably you can remember. We had a discussion about that, because it was
> > > said that the stonithd on one node would prevent to call a stonith
> > > agent to kill its own node. Is this still true?
> > 
> > Yes.
> 
> Will that change?

No.

> > > running NOW on node 2 should be possible to shoot itself.
> > > 
> > > Dejan, can you remember?
> > 
> > Yes. One serious problem in this case is that the cluster can
> > never know if the stonith operation was successful. Which would
> > basically render the cluster unusable.
> 
> So, the upper question is probably NO by design?

Yes.

> > > > It is an extra package provided by IBM. A tad heavy though: it's
> > > > a Java application.
> > > 
> > > Yes, it slurps CPU cycles like a Bavarian...?h sorry..
> > > Bohemian slurps beer.  ;-)
> > > The worse: I got errors while monitoring regularly. Someone
> > > posted here that this is related to timing problems.
> > 
> > I can recall that it worked for me(tm). However, it is very
> > demanding in terms of memory/cpu and that's definitely not good
> > for stonith.
> 
> I had monitoring running once per minute. Approx. once per day I got
> an error. The error message was not very helpful as it was a java
> stack trace.

OK. Though I'd find a once per minute monitor interval a bit
excessive. Perhaps once an hour or so would be more appropriate.

> > Hmm, I probably picked the wrong version then. Buggy in a sense
> > that it won't work at all? Can you describe the bug.
> 
> I'm sorry, but it's the worst case. It would not stonith and that' really
> the primary goal.  :-(
> I found out with my tests that the first parameter to the start/stop/restart
> actions (therefore the second parameter) is the name of the node to 
> stonith. My first script checked for exactly ONE parameter. In this case
> obviously not correct. 
> 
> > 
> > > So, Dejan, could you please check in the differences?
> > > I have the (more) correct version attached.
> > 
> > Thanks, I'll update the repository.
> 
> Thank you. Sorry for the mess.

No problem. Thank you for the contribution!

> > > IMPORTANT: 
> > > 1) The external stonith api allowes that an external stonith plugin
> > > can be responsible of shooting more that one node. Parameter while 
> > > calling.
> > > My external stonith plugin shoots exactly the one node configured via
> > > CIB. It ignores the parameter. Probably I should add a check for that.
> > 
> > What exactly do you refer to? All parameters are defined by the
> > plugin itself. If that is the case, why should you ignore any of
> > them :)
> 
> See the above answer. That's something I don't understand. I couldn't
> find something in the documentation of the external stonith plugin API.
> When a stonith action is triggered the plugin is called with the action name
> and with the node name to stonith as second parameter.
> I guess this is done to be able to create a configuration with lists of hosts
> so that the plugin-cibconfig-combination can shoot more that one node.
> But I'm not sure. In my first attempt I thought that ALL parameters are
> given as environment variables and ONLY the action is given as parameter.
> 
> But probably ...better hopefully...you can enlighten me. :-)

Yes, an action plus node is passed. Some, or actually most
devices until recently, can handle more than one node.

> > > 2) The RSA board allows only one telnet session at a time. So if someone 
> > > logs
> > > in to e.g. check something and at the same time a monitor cycle is 
> > > started by
> > > HA, the resource gets a monitor failure and will probably moved.
> > 
> > Yes, that's one typical problem with this class of devices. But
> > there's nothing one can do about that.
> 
> Let a node shoot itself. (See answers above). It's not good, but better than 
> no attempt of shooting.

Well, that probably won't change. If you want to express your
sentiment on the matter, here's the bugzilla:

http://developerbugs.linux-foundation.org/show_bug.cgi?id=1752

Cheers,

Dejan


> 
> 
> Best regards
> Andreas Mock
> _______________________________________________________________________
> Jetzt neu! Sch?tzen Sie Ihren PC mit McAfee und WEB.DE. 30 Tage
> kostenlos testen. http://www.pc-sicherheit.web.de/startseite/?mc=022220
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to