Hi,

On Tue, Dec 09, 2008 at 04:12:30AM +1100, Simon Tideswell wrote:
> Hello
>  
> Sorry about the vagueness of this post: I don't have all of the
> log files to hand at this moment.
>  
> I have a two node HA cluster on SLES 10 SP2 (64 bit). I am
> having problems getting the riloe plugin to work properly. The
> riloe script works fine when run it from the command line (i.e.
> when I run "riloe status" I get a RC of 0). But when I run the
> riloe from a clone resource the resource will not start and the
> ha-log indicates an error and a RC of 6 - I think the error
> indicated an empty "hostlist" which is not actually true as
> this parameter is definitely populated. Having read through the
> riloe script I cannot see anywhere that it returns a RC of 6 so
> I don't know where that is coming from? I saw another post that
> requested a full list of return codes (and meanings) for
> stonithd but I don't know if this was ever answered?

There should be a warning in the logs (from stonithd) which says
that the host list is empty.

> Funny thing is I have two nodes (let's say A and B), each with
> a HP ILO. There are two clone resources, one for each ILO and
> for each clone I have set clone_node_max = 1 and clone_max = 2.
> The stonith resource for ILO-B starts on node A but the stonith
> resource for ILO-A will not start on node A - they use the same
> riloe plugin and it works when run manually? Note that node B
> has not been built yet (i.e. no OS) but it is powered on. This
> behaviour (of stonith for ILO-A not being allowed to run on
> node A alone) might be entirely by design, but I don't think it
> is documented so it is confusing me greatly.

Every stonith resource start implies a status check. In other
words, a resource would start only in case status check passes.
Not sure if that's the reason here though. The logs should say.
There is also a bug which I found only yesterday which sometimes
may influence hostlist filtering, but it occurs only very seldom
and only on a freshly booted host. So far nobody reported it.

> If I change the
> "hostlist" parameter of the ILO-A clone to be something other
> than "A" then it runs fine - so this seems to support this
> notion but I was just trying to get some feedback from the
> mailing list on this. I suppose it is reasonable that stonith
> won't run of it is only able to suicide and no other node can
> kill it but if the return codes were documented or this
> behaviour was identified in the documentation it would make
> things so much easier. Of course I might be barking up the
> wrong tree and there may be another reason for stonithd for
> ILO-A on node A not starting and if anyone has any ideas it
> would be much appreciated.

Since you run SLES, you may open an L3 call.

Also, make sure that you run the latest release SP2 code:
external/riloe was not in a very good shape until this summer.

Thanks,

Dejan

> Simon
>  
>  
> 
> Network Ten Pty Ltd ABN 91 052 515 250
> 
> Network Ten Disclaimer.
> This e-mail (including all attachments) is intended solely for the 
> named addressee. If you receive it in error, please let us know by 
> reply e-mail, delete it from your system and destroy the copies. 
> This e-mail is also subject to copyright. No part of it should be 
> reproduced, adapted or transmitted without the written consent of 
> the copyright owner. E-mails may be interfered with, may contain 
> computer viruses or other defects and may not be successfully 
> replicated on other systems. We give no warranties in relation to 
> these matters. If you have any doubts about the authenticity of an 
> e-mail purportedly sent by us, please contact us immediately.
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to