Hi, On Fri, Jul 03, 2009 at 11:04:11AM +0200, Jan Kalcic wrote: > Jan Kalcic wrote: > > Dejan Muhamedagic wrote: > > > >> Hi, > >> > >> On Fri, Jun 26, 2009 at 04:33:30PM +0200, Jan Kalcic wrote: > >> > >> > >>> Andrew Beekhof wrote: > >>> > >>> > >>>> On Fri, Jun 26, 2009 at 3:07 PM, Jan Kalcic<[email protected]> wrote: > >>>> > >>>> > >>>> > >>>>> Andrew Beekhof wrote: > >>>>> > >>>>> > >>>>> > >>>>>> On Fri, Jun 26, 2009 at 10:55 AM, Jan<[email protected]> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> a very boring issue with stonith using the plugin external/riloe > >>>>>>> (never used > >>>>>>> it). Whenever I try to simulate a split-brain condition (using > >>>>>>> iptables) in > >>>>>>> order to test stonith, both nodes kill each other. Not exactly what > >>>>>>> expected. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Sure it is > >>>>>> > >>>>>> [snip] > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> <nvpair id="nvpair-56c027e0-80c8-49a7-9cf1-1af593a9391f" > >>>>>>> name="no-quorum-policy" > >>>>>>> value="ignore"/> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> With that option, this is exactly what I'd expect. > >>>>>> > >>>>>> Have a read of: > >>>>>> http://ourobengr.com/ha > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> For what I understood, probably wrongly, that should be the right option > >>>>> for a two nodes cluster, where only one node can't have quorum, that's > >>>>> why should be "ignore". Is this wrong? > >>>>> > >>>>> I had already taken a quick look at that document (I love that picture > >>>>> btw) but not as deeply as now. I am going to review my timeout for sure. > >>>>> Anyway, I don't get any hint about the quorum setting. Should it be > >>>>> different that "ignore"? > >>>>> > >>>>> > >>>>> > >>>> No, thats the right value for a two node cluster. > >>>> But that value can also leads to the behavior you described. > >>>> > >>>> Though normally one side shoots the other before it can shoot back. > >>>> > >>>> > >>>> > >>> This does not happen. The reason could be that usin iLO the node is not > >>> actually shot but gracefully shutdown. For this reason the shot node has > >>> all the time to shoot the other side back. Make sense? > >>> > >>> > >> Yes, it does. > >> > >> > >> > >>> In this case I would need to stonith the other side not gracefully but > >>> strongly like unplugging the cable but it seems this is not available > >>> with the riloe plugin, is it? > >>> > >>> > >> Yes, it is. You should use the latest version of the plugin. > >> > >> > > > > I checked the plugin's version and it seems to be the very last one. It > > is the one installed with SLES11-HA. A diff with the plugin available on > > the openSuSE build service for openSuSE 11.1 reports they are the same. > > > >> ilo_powerdown_method should be set to power, AFAIK. I think that > >> that does a "cable pull" operation. If you still find a problem > >> with nodes shooting each other at the same time, please file a > >> bugzilla. I'm not sure if that can be fixed, depends on the > >> timings when talking to the device. > >> > >> > > > > I will try with the power option in the next few days. What let me > > confused is the description below I extracted from the plugin. "power" > > takes longer than button. I would expect it is shoot the node > > immediately in order to not be stonith back. > > > > <shortdesc lang="en">Power down method</shortdesc> > > <longdesc lang="en"> > > The method to powerdown the host in question. > > * button - Emulate holding down the power button > > * power - Emulate turning off the machines power > > > > NB: A button request takes around 20 seconds. The power method > > about half a minute. > > > > > Ok, actually the power method was the one I was already using. What I > changed is the stonith action from poweroff, which shutdown gracefully > the node, to reboot which actually reboot the server but it also resets > it in few seconds.
Not sure if I understand this. poweroff does result in the SET_HOST_POWER request which should just remove the power from the host. But, if ilo_can_reset is '0' then reset is also poweroff followed by poweron. Perhaps you can set also ilo_can_reset. That would make the plugin use the actual ilo reset command. Some ilos don't support it though. > Deadthmatch no longer occur. From command line I > managed to stonith the node just like I want. Reset and with no reboot, > (-T reset) but I could not "move" this command into pacemaker. Strange, since the stonith program uses the same plugin. Thanks, Dejan > Thanks, > Jan > > > Thanks, > > Jan > > > >> Thanks, > >> > >> Dejan > >> > >> > >> > >> > >> > >>> Thanks, > >>> Jan > >>> > >>> > >>>>> My issue isn't exactly the deathmatch described there, first of all > >>>>> because the openais daemon is disable at boot and secondly because this > >>>>> stonith policy is poweroff. Rather, is a strange situation where both > >>>>> nodes kill themselves and they both shutdown. > >>>>> > >>>>> > >>>>> > >>>> They'd both be killing each other. > >>>> > >>>> > >>>> > >>>> > >>>>> I wonder if it is a timeout issue. My timeout here for the stonith > >>>>> resource is 15s. Does it mean that when a stonith is sent by the first > >>>>> node to the second one and this node can't shutdown itself in 15s, it > >>>>> stonith the first node? > >>>>> > >>>>> > >>>>> > >>>> No. This is unrelated > >>>> _______________________________________________ > >>>> Linux-HA mailing list > >>>> [email protected] > >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>>> See also: http://linux-ha.org/ReportingProblems > >>>> > >>>> > >>>> > >>>> > >>> _______________________________________________ > >>> Linux-HA mailing list > >>> [email protected] > >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>> See also: http://linux-ha.org/ReportingProblems > >>> > >>> > >> _______________________________________________ > >> Linux-HA mailing list > >> [email protected] > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > >> > >> > >> > > > > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
