Hi Andrew.
Are the counters supposed to be zeroed at boot, as they seem to be in my case? 
Or is that my problem.
I would expect them to get their values from the running heartbeat system (DC).
Is there a way to prevent the standby from trying to start the to services 
simultainiously?
Also:
I have tried the same case with default value for failure stickiness but only 
one network for heartbeat. It behaves really bad. The standby may refuse to 
take over any of the servers, regardless on what servers I disconnected the 
power.

BR.


 *** Thomas
This communication is confidential and intended solely for the addressee(s). 
Any unauthorized review, use, disclosure or distribution is prohibited. If you 
believe this message has been sent to you in error, please notify the sender by 
replying to this transmission and delete the message without disclosing it. 
Thank you.
E-mail including attachments is susceptible to data corruption, interruption, 
unauthorized amendment, tampering and viruses, and we only send and receive 
e-mails on the basis that we are not liable for any such corruption, 
interception, amendment, tampering or viruses or any consequences thereof.


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Beekhof
Sent: den 5 juni 2007 15:27
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] standby does not take over on multiple power failure

On 6/5/07, Thomas Åkerblom (HF/EBC) <[EMAIL PROTECTED]> wrote:
> Hi
> OK. so the problem is that there was an error in starting the service on the 
> standby server because of the simultaneous failures. And because of the 
> failure stickiness -INFINITY it would not run there again. And for some 
> reason the counter resets on ha-8 and therefore it will run there.
> I was rather confused by the counters as the -D gives an error if the counter 
> is 0, but it works as you suggest. At least some things are a bit clearer 
> now. I'm sending the complete logfile as I know it to see if it passes your 
> size limitation.


the logs are noisy (we're always working on that) but they do compress well :-)

> BR.
> Thomas
>
>  *** Thomas
> This communication is confidential and intended solely for the addressee(s). 
> Any unauthorized review, use, disclosure or distribution is prohibited. If 
> you believe this message has been sent to you in error, please notify the 
> sender by replying to this transmission and delete the message without 
> disclosing it. Thank you.
> E-mail including attachments is susceptible to data corruption, interruption, 
> unauthorized amendment, tampering and viruses, and we only send and receive 
> e-mails on the basis that we are not liable for any such corruption, 
> interception, amendment, tampering or viruses or any consequences thereof.
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Beekhof
> Sent: den 5 juni 2007 12:47
> To: General Linux-HA mailing list
> Subject: Re: [Linux-HA] standby does not take over on multiple power failure
>
> according to these logs it happened even earlier...
>
> can i suggest running:
>
> crm_failcount -D -r rsc_lim3 -U ha-9
> crm_failcount -D -r rsc_lim8 -U ha-9
>
> and re-testing?  I'd be very surprised if it didn't start working as a result.
>
> On 6/5/07, Thomas Åkerblom (HF/EBC) <[EMAIL PROTECTED]> wrote:
> > OK, that would explain that part.
> > Here is the logfile covering that day.
> >
> > BR.
> > /Thomas
> >
> >  *** Thomas
> > This communication is confidential and intended solely for the 
> > addressee(s). Any unauthorized review, use, disclosure or distribution is 
> > prohibited. If you believe this message has been sent to you in error, 
> > please notify the sender by replying to this transmission and delete the 
> > message without disclosing it. Thank you.
> > E-mail including attachments is susceptible to data corruption, 
> > interruption, unauthorized amendment, tampering and viruses, and we only 
> > send and receive e-mails on the basis that we are not liable for any such 
> > corruption, interception, amendment, tampering or viruses or any 
> > consequences thereof.
> >
> >
> > -----Original Message-----
> > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew 
> > Beekhof
> > Sent: den 5 juni 2007 09:49
> > To: General Linux-HA mailing list
> > Subject: Re: [Linux-HA] standby does not take over on multiple power failure
> >
> > On 6/5/07, Thomas Åkerblom (HF/EBC) <[EMAIL PROTECTED]> wrote:
> > > Hi Andrew, and thank you for prompt response.
> > > My objection with the failure stickiness is that in my lab the rscX will 
> > > get back to nodeY after running on the standby.
> > >
> > > Ex:
> > > I pull the power cord to ha-8
> > >         rsc_lim8 will start to run on the standby ha-9
> > > I insert the power cord to ha-8
> > >         rsc_lim8 will stop on ha-9 and start to run on ha-8
> > > This can be repeated.
> > >
> > > In my case the standby will get in a state where it will not start 
> > > rsc_lim8 when ha-8 goes down, ever. That happens after the procedure I 
> > > described when I first pull the power cord of one server, inserts it and 
> > > pull the cord of another server before the first has started properly. 
> > > And that behavior ceased when I removed the failure stickiness.
> >
> >
> >
> > resource_failure_stickiness shouldn't apply here because the entire
> > node failed rather than just the resource
> >
> > according to the logs you attached, something other than a full node
> > failure also happened which caused the failcount for rsc_lim3 and
> > rsc_lim_8 to be set to 1.
> >
> > pengine[24452]: 2007/06/04_08:52:57 debug: process_rsc_state:
> > fail-count-rsc_lim3: 1
> > pengine[24452]: 2007/06/04_08:52:57 debug: process_rsc_state: Setting
> > failure stickiness for rsc_lim3 on ha-9: -1000000
> > pengine[24452]: 2007/06/04_08:52:57 debug: process_rsc_state:
> > fail-count-rsc_lim8: 1
> > pengine[24452]: 2007/06/04_08:52:57 debug: process_rsc_state: Setting
> > failure stickiness for rsc_lim8 on ha-9: -1000000
> >
> > unfortunately the logs dont go back far enough to know what or when
> > that event was
> >
> > >
> > > BR.
> > > /Thomas
> > >
> > >  *** Thomas
> > > This communication is confidential and intended solely for the 
> > > addressee(s). Any unauthorized review, use, disclosure or distribution is 
> > > prohibited. If you believe this message has been sent to you in error, 
> > > please notify the sender by replying to this transmission and delete the 
> > > message without disclosing it. Thank you.
> > > E-mail including attachments is susceptible to data corruption, 
> > > interruption, unauthorized amendment, tampering and viruses, and we only 
> > > send and receive e-mails on the basis that we are not liable for any such 
> > > corruption, interception, amendment, tampering or viruses or any 
> > > consequences thereof.
> > >
> > >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew 
> > > Beekhof
> > > Sent: den 4 juni 2007 13:20
> > > To: General Linux-HA mailing list
> > > Subject: Re: [Linux-HA] standby does not take over on multiple power 
> > > failure
> > >
> > > On 6/4/07, Thomas Åkerblom (HF/EBC) <[EMAIL PROTECTED]> wrote:
> > > > Hi Andrew.
> > > > I'm using 2.0.8-0.15, but I have seen the same behavior in 2.0.7.
> > > > In this case ha-9 is DC and also the standby server.
> > > > ha-8 has no power, but the standby server does not take over.
> > > > The logs begin right before I pulled the power cord.
> > > >
> > > > Actually I do know how to get around this problem now, but I also have 
> > > > some new questions.
> > > > If I remove the line:
> > > > <nvpair id="default_resource_failure_stickiness" 
> > > > name="default_resource_failure_stickiness" value="-INFINITY"/>
> > > > In the cib file the problem disappears.
> > > > I wouldn't expect that parameter to have this effect, rather the 
> > > > opposite.
> > > > Is this a known/expected correlation?
> > >
> > > not so much "correlation" as "thats what its designed to do".
> > >
> > > setting default_resource_failure_stickiness=-INFINITY means that if
> > > heartbeat finds the rscX as failed on nodeY, then never ever consider
> > > nodeY as a valid place to run rscX ever again... at least not until
> > > the admin "clears" the error by resetting the failcount.
> > >
> > > in the future we'll expire the failures after "a period of time" but
> > > that is not yet implemented as the lrm doesn't provide the infomation
> > > to do so.
> > >
> > > > I would like to set that parameter in order to be able to use the 
> > > > failure counters.
> > > > Furthermore I am not able to read and reset the counters using:
> > > >
> > > > crm_failcount -G -U ha-8 -r rsc_lim8
> > > >         The result is always 0
> > > >
> > > > crm_failcount -D -U ha-8 -r rsc_lim8
> > > >         Error performing operation: The object/attribute does not exist.
> > >
> > > later versions return 0 instead of "The object/attribute does not exist."
> > >
> > > updated packages for most distros/platforms are available at:
> > >    http://software.opensuse.org/download/server:/ha-clustering/
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> >
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to