Hi, On Tue, Dec 09, 2008 at 04:12:30AM +1100, Simon Tideswell wrote: > Hello > > Sorry about the vagueness of this post: I don't have all of the > log files to hand at this moment. > > I have a two node HA cluster on SLES 10 SP2 (64 bit). I am > having problems getting the riloe plugin to work properly. The > riloe script works fine when run it from the command line (i.e. > when I run "riloe status" I get a RC of 0). But when I run the > riloe from a clone resource the resource will not start and the > ha-log indicates an error and a RC of 6 - I think the error > indicated an empty "hostlist" which is not actually true as > this parameter is definitely populated. Having read through the > riloe script I cannot see anywhere that it returns a RC of 6 so > I don't know where that is coming from? I saw another post that > requested a full list of return codes (and meanings) for > stonithd but I don't know if this was ever answered?
There should be a warning in the logs (from stonithd) which says that the host list is empty. > Funny thing is I have two nodes (let's say A and B), each with > a HP ILO. There are two clone resources, one for each ILO and > for each clone I have set clone_node_max = 1 and clone_max = 2. > The stonith resource for ILO-B starts on node A but the stonith > resource for ILO-A will not start on node A - they use the same > riloe plugin and it works when run manually? Note that node B > has not been built yet (i.e. no OS) but it is powered on. This > behaviour (of stonith for ILO-A not being allowed to run on > node A alone) might be entirely by design, but I don't think it > is documented so it is confusing me greatly. Every stonith resource start implies a status check. In other words, a resource would start only in case status check passes. Not sure if that's the reason here though. The logs should say. There is also a bug which I found only yesterday which sometimes may influence hostlist filtering, but it occurs only very seldom and only on a freshly booted host. So far nobody reported it. > If I change the > "hostlist" parameter of the ILO-A clone to be something other > than "A" then it runs fine - so this seems to support this > notion but I was just trying to get some feedback from the > mailing list on this. I suppose it is reasonable that stonith > won't run of it is only able to suicide and no other node can > kill it but if the return codes were documented or this > behaviour was identified in the documentation it would make > things so much easier. Of course I might be barking up the > wrong tree and there may be another reason for stonithd for > ILO-A on node A not starting and if anyone has any ideas it > would be much appreciated. Since you run SLES, you may open an L3 call. Also, make sure that you run the latest release SP2 code: external/riloe was not in a very good shape until this summer. Thanks, Dejan > Simon > > > > Network Ten Pty Ltd ABN 91 052 515 250 > > Network Ten Disclaimer. > This e-mail (including all attachments) is intended solely for the > named addressee. If you receive it in error, please let us know by > reply e-mail, delete it from your system and destroy the copies. > This e-mail is also subject to copyright. No part of it should be > reproduced, adapted or transmitted without the written consent of > the copyright owner. E-mails may be interfered with, may contain > computer viruses or other defects and may not be successfully > replicated on other systems. We give no warranties in relation to > these matters. If you have any doubts about the authenticity of an > e-mail purportedly sent by us, please contact us immediately. > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
