Hi, On Fri, Jun 26, 2009 at 04:33:30PM +0200, Jan Kalcic wrote: > Andrew Beekhof wrote: > > On Fri, Jun 26, 2009 at 3:07 PM, Jan Kalcic<[email protected]> wrote: > > > >> Andrew Beekhof wrote: > >> > >>> On Fri, Jun 26, 2009 at 10:55 AM, Jan<[email protected]> wrote: > >>> > >>> > >>>> Hi, > >>>> > >>>> a very boring issue with stonith using the plugin external/riloe (never > >>>> used > >>>> it). Whenever I try to simulate a split-brain condition (using iptables) > >>>> in > >>>> order to test stonith, both nodes kill each other. Not exactly what > >>>> expected. > >>>> > >>>> > >>> Sure it is > >>> > >>> [snip] > >>> > >>> > >>> > >>>> <nvpair id="nvpair-56c027e0-80c8-49a7-9cf1-1af593a9391f" > >>>> name="no-quorum-policy" > >>>> value="ignore"/> > >>>> > >>>> > >>> With that option, this is exactly what I'd expect. > >>> > >>> Have a read of: > >>> http://ourobengr.com/ha > >>> > >>> > >> For what I understood, probably wrongly, that should be the right option > >> for a two nodes cluster, where only one node can't have quorum, that's > >> why should be "ignore". Is this wrong? > >> > >> I had already taken a quick look at that document (I love that picture > >> btw) but not as deeply as now. I am going to review my timeout for sure. > >> Anyway, I don't get any hint about the quorum setting. Should it be > >> different that "ignore"? > >> > > > > No, thats the right value for a two node cluster. > > But that value can also leads to the behavior you described. > > > > Though normally one side shoots the other before it can shoot back. > > > This does not happen. The reason could be that usin iLO the node is not > actually shot but gracefully shutdown. For this reason the shot node has > all the time to shoot the other side back. Make sense?
Yes, it does. > In this case I would need to stonith the other side not gracefully but > strongly like unplugging the cable but it seems this is not available > with the riloe plugin, is it? Yes, it is. You should use the latest version of the plugin. ilo_powerdown_method should be set to power, AFAIK. I think that that does a "cable pull" operation. If you still find a problem with nodes shooting each other at the same time, please file a bugzilla. I'm not sure if that can be fixed, depends on the timings when talking to the device. Thanks, Dejan > Thanks, > Jan > >> My issue isn't exactly the deathmatch described there, first of all > >> because the openais daemon is disable at boot and secondly because this > >> stonith policy is poweroff. Rather, is a strange situation where both > >> nodes kill themselves and they both shutdown. > >> > > > > They'd both be killing each other. > > > > > >> I wonder if it is a timeout issue. My timeout here for the stonith > >> resource is 15s. Does it mean that when a stonith is sent by the first > >> node to the second one and this node can't shutdown itself in 15s, it > >> stonith the first node? > >> > > > > No. This is unrelated > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
