Hi,

On Fri, Jul 03, 2009 at 11:04:11AM +0200, Jan Kalcic wrote:
> Jan Kalcic wrote:
> > Dejan Muhamedagic wrote:
> >   
> >> Hi,
> >>
> >> On Fri, Jun 26, 2009 at 04:33:30PM +0200, Jan Kalcic wrote:
> >>   
> >>     
> >>> Andrew Beekhof wrote:
> >>>     
> >>>       
> >>>> On Fri, Jun 26, 2009 at 3:07 PM, Jan Kalcic<[email protected]> wrote:
> >>>>   
> >>>>       
> >>>>         
> >>>>> Andrew Beekhof wrote:
> >>>>>     
> >>>>>         
> >>>>>           
> >>>>>> On Fri, Jun 26, 2009 at 10:55 AM, Jan<[email protected]> wrote:
> >>>>>>
> >>>>>>       
> >>>>>>           
> >>>>>>             
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> a very boring issue with stonith using the plugin external/riloe 
> >>>>>>> (never used
> >>>>>>> it). Whenever I try to simulate a split-brain condition (using 
> >>>>>>> iptables) in
> >>>>>>> order to test stonith, both nodes kill each other. Not exactly what
> >>>>>>> expected.
> >>>>>>>
> >>>>>>>         
> >>>>>>>             
> >>>>>>>               
> >>>>>> Sure it is
> >>>>>>
> >>>>>> [snip]
> >>>>>>
> >>>>>>
> >>>>>>       
> >>>>>>           
> >>>>>>             
> >>>>>>>        <nvpair id="nvpair-56c027e0-80c8-49a7-9cf1-1af593a9391f"
> >>>>>>> name="no-quorum-policy"
> >>>>>>> value="ignore"/>
> >>>>>>>
> >>>>>>>         
> >>>>>>>             
> >>>>>>>               
> >>>>>> With that option, this is exactly what I'd expect.
> >>>>>>
> >>>>>> Have a read of:
> >>>>>>    http://ourobengr.com/ha
> >>>>>>
> >>>>>>       
> >>>>>>           
> >>>>>>             
> >>>>> For what I understood, probably wrongly, that should be the right option
> >>>>> for a two nodes cluster, where only one node can't have quorum, that's
> >>>>> why should be "ignore". Is this wrong?
> >>>>>
> >>>>> I had already taken a quick look at that document (I love that picture
> >>>>> btw) but not as deeply as now. I am going to review my timeout for sure.
> >>>>> Anyway, I don't get any hint about the quorum setting. Should it be
> >>>>> different that "ignore"?
> >>>>>     
> >>>>>         
> >>>>>           
> >>>> No, thats the right value for a two node cluster.
> >>>> But that value can also leads to the behavior you described.
> >>>>
> >>>> Though normally one side shoots the other before it can shoot back.
> >>>>   
> >>>>       
> >>>>         
> >>> This does not happen. The reason could be that usin iLO the node is not
> >>> actually shot but gracefully shutdown. For this reason the shot node has
> >>> all the time to shoot the other side back. Make sense?
> >>>     
> >>>       
> >> Yes, it does.
> >>
> >>   
> >>     
> >>> In this case I would need to stonith the other side not gracefully but
> >>> strongly like unplugging the cable but it seems this is not available
> >>> with the riloe plugin, is it?
> >>>     
> >>>       
> >> Yes, it is. You should use the latest version of the plugin.
> >>   
> >>     
> >
> > I checked the plugin's version and it seems to be the very last one. It
> > is the one installed with SLES11-HA. A diff with the plugin available on
> > the openSuSE build service for openSuSE 11.1 reports they are the same.
> >   
> >> ilo_powerdown_method should be set to power, AFAIK. I think that
> >> that does a "cable pull" operation. If you still find a problem
> >> with nodes shooting each other at the same time, please file a
> >> bugzilla. I'm not sure if that can be fixed, depends on the
> >> timings when talking to the device.
> >>   
> >>     
> >
> > I will try with the power option in the next few days. What let me
> > confused is the description below I extracted from the plugin. "power"
> > takes longer than button. I would expect it is shoot the node
> > immediately in order to not be stonith back.
> >
> > <shortdesc lang="en">Power down method</shortdesc>
> > <longdesc lang="en">
> > The method to powerdown the host in question.
> > * button - Emulate holding down the power button
> > * power - Emulate turning off the machines power
> >
> > NB: A button request takes around 20 seconds. The power method
> > about half a minute.
> >
> >   
> Ok, actually the power method was the one I was already using. What I
> changed is the stonith action from poweroff, which shutdown gracefully
> the node, to reboot which actually reboot the server but it also resets
> it in few seconds.

Not sure if I understand this. poweroff does result in the
SET_HOST_POWER request which should just remove the power from
the host. But, if ilo_can_reset is '0' then reset is also
poweroff followed by poweron. Perhaps you can set also
ilo_can_reset. That would make the plugin use the actual ilo
reset command. Some ilos don't support it though.

> Deadthmatch no longer occur. From command line I
> managed to stonith the node just like I want. Reset and with no reboot,
> (-T reset) but I could not "move" this command into pacemaker.

Strange, since the stonith program uses the same plugin.

Thanks,

Dejan

> Thanks,
> Jan
> 
> > Thanks,
> > Jan
> >   
> >> Thanks,
> >>
> >> Dejan
> >>
> >>
> >>
> >>   
> >>     
> >>> Thanks,
> >>> Jan
> >>>     
> >>>       
> >>>>> My issue isn't exactly the deathmatch described there, first of all
> >>>>> because the openais daemon is disable at boot and secondly because this
> >>>>> stonith policy is poweroff. Rather, is a strange situation where both
> >>>>> nodes kill themselves and they both shutdown.
> >>>>>     
> >>>>>         
> >>>>>           
> >>>> They'd both be killing each other.
> >>>>
> >>>>   
> >>>>       
> >>>>         
> >>>>> I wonder if it is a timeout issue. My timeout here for the stonith
> >>>>> resource is 15s. Does it mean that when a stonith is sent by the first
> >>>>> node to the second one and this node can't shutdown itself in 15s, it
> >>>>> stonith the first node?
> >>>>>     
> >>>>>         
> >>>>>           
> >>>> No.  This is unrelated
> >>>> _______________________________________________
> >>>> Linux-HA mailing list
> >>>> [email protected]
> >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>> See also: http://linux-ha.org/ReportingProblems
> >>>>
> >>>>   
> >>>>       
> >>>>         
> >>> _______________________________________________
> >>> Linux-HA mailing list
> >>> [email protected]
> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> See also: http://linux-ha.org/ReportingProblems
> >>>     
> >>>       
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> >>   
> >>     
> >
> >
> >   
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to