Hi,

On Fri, Jun 26, 2009 at 04:33:30PM +0200, Jan Kalcic wrote:
> Andrew Beekhof wrote:
> > On Fri, Jun 26, 2009 at 3:07 PM, Jan Kalcic<[email protected]> wrote:
> >   
> >> Andrew Beekhof wrote:
> >>     
> >>> On Fri, Jun 26, 2009 at 10:55 AM, Jan<[email protected]> wrote:
> >>>
> >>>       
> >>>> Hi,
> >>>>
> >>>> a very boring issue with stonith using the plugin external/riloe (never 
> >>>> used
> >>>> it). Whenever I try to simulate a split-brain condition (using iptables) 
> >>>> in
> >>>> order to test stonith, both nodes kill each other. Not exactly what
> >>>> expected.
> >>>>
> >>>>         
> >>> Sure it is
> >>>
> >>> [snip]
> >>>
> >>>
> >>>       
> >>>>        <nvpair id="nvpair-56c027e0-80c8-49a7-9cf1-1af593a9391f"
> >>>> name="no-quorum-policy"
> >>>> value="ignore"/>
> >>>>
> >>>>         
> >>> With that option, this is exactly what I'd expect.
> >>>
> >>> Have a read of:
> >>>    http://ourobengr.com/ha
> >>>
> >>>       
> >> For what I understood, probably wrongly, that should be the right option
> >> for a two nodes cluster, where only one node can't have quorum, that's
> >> why should be "ignore". Is this wrong?
> >>
> >> I had already taken a quick look at that document (I love that picture
> >> btw) but not as deeply as now. I am going to review my timeout for sure.
> >> Anyway, I don't get any hint about the quorum setting. Should it be
> >> different that "ignore"?
> >>     
> >
> > No, thats the right value for a two node cluster.
> > But that value can also leads to the behavior you described.
> >
> > Though normally one side shoots the other before it can shoot back.
> >   
> This does not happen. The reason could be that usin iLO the node is not
> actually shot but gracefully shutdown. For this reason the shot node has
> all the time to shoot the other side back. Make sense?

Yes, it does.

> In this case I would need to stonith the other side not gracefully but
> strongly like unplugging the cable but it seems this is not available
> with the riloe plugin, is it?

Yes, it is. You should use the latest version of the plugin.

ilo_powerdown_method should be set to power, AFAIK. I think that
that does a "cable pull" operation. If you still find a problem
with nodes shooting each other at the same time, please file a
bugzilla. I'm not sure if that can be fixed, depends on the
timings when talking to the device.

Thanks,

Dejan



> Thanks,
> Jan
> >> My issue isn't exactly the deathmatch described there, first of all
> >> because the openais daemon is disable at boot and secondly because this
> >> stonith policy is poweroff. Rather, is a strange situation where both
> >> nodes kill themselves and they both shutdown.
> >>     
> >
> > They'd both be killing each other.
> >
> >   
> >> I wonder if it is a timeout issue. My timeout here for the stonith
> >> resource is 15s. Does it mean that when a stonith is sent by the first
> >> node to the second one and this node can't shutdown itself in 15s, it
> >> stonith the first node?
> >>     
> >
> > No.  This is unrelated
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> >   
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to