Dejan Muhamedagic wrote: > Hi, > > On Fri, Jun 26, 2009 at 04:33:30PM +0200, Jan Kalcic wrote: > >> Andrew Beekhof wrote: >> >>> On Fri, Jun 26, 2009 at 3:07 PM, Jan Kalcic<[email protected]> wrote: >>> >>> >>>> Andrew Beekhof wrote: >>>> >>>> >>>>> On Fri, Jun 26, 2009 at 10:55 AM, Jan<[email protected]> wrote: >>>>> >>>>> >>>>> >>>>>> Hi, >>>>>> >>>>>> a very boring issue with stonith using the plugin external/riloe (never >>>>>> used >>>>>> it). Whenever I try to simulate a split-brain condition (using iptables) >>>>>> in >>>>>> order to test stonith, both nodes kill each other. Not exactly what >>>>>> expected. >>>>>> >>>>>> >>>>>> >>>>> Sure it is >>>>> >>>>> [snip] >>>>> >>>>> >>>>> >>>>> >>>>>> <nvpair id="nvpair-56c027e0-80c8-49a7-9cf1-1af593a9391f" >>>>>> name="no-quorum-policy" >>>>>> value="ignore"/> >>>>>> >>>>>> >>>>>> >>>>> With that option, this is exactly what I'd expect. >>>>> >>>>> Have a read of: >>>>> http://ourobengr.com/ha >>>>> >>>>> >>>>> >>>> For what I understood, probably wrongly, that should be the right option >>>> for a two nodes cluster, where only one node can't have quorum, that's >>>> why should be "ignore". Is this wrong? >>>> >>>> I had already taken a quick look at that document (I love that picture >>>> btw) but not as deeply as now. I am going to review my timeout for sure. >>>> Anyway, I don't get any hint about the quorum setting. Should it be >>>> different that "ignore"? >>>> >>>> >>> No, thats the right value for a two node cluster. >>> But that value can also leads to the behavior you described. >>> >>> Though normally one side shoots the other before it can shoot back. >>> >>> >> This does not happen. The reason could be that usin iLO the node is not >> actually shot but gracefully shutdown. For this reason the shot node has >> all the time to shoot the other side back. Make sense? >> > > Yes, it does. > > >> In this case I would need to stonith the other side not gracefully but >> strongly like unplugging the cable but it seems this is not available >> with the riloe plugin, is it? >> > > Yes, it is. You should use the latest version of the plugin. >
I checked the plugin's version and it seems to be the very last one. It is the one installed with SLES11-HA. A diff with the plugin available on the openSuSE build service for openSuSE 11.1 reports they are the same. > ilo_powerdown_method should be set to power, AFAIK. I think that > that does a "cable pull" operation. If you still find a problem > with nodes shooting each other at the same time, please file a > bugzilla. I'm not sure if that can be fixed, depends on the > timings when talking to the device. > I will try with the power option in the next few days. What let me confused is the description below I extracted from the plugin. "power" takes longer than button. I would expect it is shoot the node immediately in order to not be stonith back. <shortdesc lang="en">Power down method</shortdesc> <longdesc lang="en"> The method to powerdown the host in question. * button - Emulate holding down the power button * power - Emulate turning off the machines power NB: A button request takes around 20 seconds. The power method about half a minute. Thanks, Jan > Thanks, > > Dejan > > > > >> Thanks, >> Jan >> >>>> My issue isn't exactly the deathmatch described there, first of all >>>> because the openais daemon is disable at boot and secondly because this >>>> stonith policy is poweroff. Rather, is a strange situation where both >>>> nodes kill themselves and they both shutdown. >>>> >>>> >>> They'd both be killing each other. >>> >>> >>> >>>> I wonder if it is a timeout issue. My timeout here for the stonith >>>> resource is 15s. Does it mean that when a stonith is sent by the first >>>> node to the second one and this node can't shutdown itself in 15s, it >>>> stonith the first node? >>>> >>>> >>> No. This is unrelated >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >>> >>> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
