Hi Dave, On Wed, Oct 03, 2007 at 08:49:23AM -0500, Dave Blaschke wrote: > Dejan Muhamedagic wrote: > >Hi, > > > >On Tue, Oct 02, 2007 at 10:55:03PM +0100, Peter Farrell wrote: > > > >>On 02/10/2007, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > >> > >>>Hi, > >>> > >>>On Tue, Oct 02, 2007 at 05:17:38PM +0100, Peter Farrell wrote: > >>> > >>>>Can someone verify my CIB please? > >>>> > >>>>It's not working as intended and the more I read the less I > >>>>understand... > >>>>I've stared at the config for the past 2 days hoping to be struck by > >>>>sudden understanding... hasn't happened yet. > >>>> > >>>Don't worry, the learning curve is extremely steep. We all need > >>>quite some patience. > >>> > >>> > >>>>I don't understand how you make a rule, and then call that rule as a > >>>>result of an action. I used the bit from the pingd FAQ page: > >>>>http://www.linux-ha.org/v2/faq/pingd > >>>>"Quickstart - Only Run my_resource on Nodes with Access to at Least > >>>>One Ping Node" > >>>> > >>>>So - for my pingd clone, the operation is 'monitor' and 'on_fail=fence' > >>>><op id="pingd-child-monitor" name="monitor" interval="20s" > >>>>timeout="40s" prereq="nothing" on_fail="fence"/> > >>>> > >>>>I assume that this literally means: > >>>>"ask the LRM to see if pingd is running every 20s, if after 40s pingd > >>>>is not running, call it 'failed', and as it's 'failed' - fence it off, > >>>>which forces the resource to migrate to another node and marks this > >>>>one as 'degraded' and will not allow resource to run until it's been > >>>>cleaned up" > >>>> > >>>>Is that right? If so, then this bit I'm OK with. > >>>> > >>>No, not exactly. The monitor operation may fail (i.e. the > >>>resource agent says that the resource isn't running) or timeout > >>>(that's what you described). Of course, both are considered to be > >>>failures by CRM. on_fail=fence means that if this operation > >>>fails, the node will be fenced, i.e. rebooted if you have an > >>>operational stonith device. Perhaps a tad harsh for a monitor > >>>failure. > >>> > >>1. The approach for me is (this is a test cluster - but I want to use > >>it to replace a production one) - if either of the load balancers > >>can't ping one or two routers in my DMZ, then this must mean they're > >>dead. I figured if they can't see the router - how the hell can they > >>see the apache servers they're meant to be managing? > >>Is this 'correct political thought' or a sloppy foundation to begin with? > >> > > > >It's just that the resources _are_ going to move. No need to kill > >the cooperating node. > > > > > >>2. I didn't know that fence meant 'rebooted'. I thought it was sort of > >>'fenced off' and left in a degraded state should someone want to poke > >>around a bit. > >>RE: Perhaps a tad harsh for a monitor failure - I agree. But what's a > >>girl to do? > >>Am I on the right track here? Do I want it rebooting? Do I just want > >>Heartbeat to restart? Does it matter? If it comes up and the link is > >>still dead - will it cycle forever w/ reboots? > >> > > > >Not sure, but could be. Whenever a node comes up all resources > >are probed, i.e. one monitor operation is fired. > > > > > >>3. the real bit I'm missing: Let's say I want it rebooted after > >>fencing. > >> > > > >Fencing _is_ rebooting. > > > > > Sorry to jump in the middle of this thread, but can't you also power off > the node by setting stonith_action to poweroff instead of reboot? Of > course you need a stonith device that supports ST_POWEROFF... I haven't > read through the code but I'd assume that option works.
True. Thanks for mentioning this. Dejan > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
