RE: [Linux-HA] trying to understand failback cause - log attached

Damon Estep Tue, 18 Mar 2008 04:59:51 -0700

> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:linux-ha-
> [EMAIL PROTECTED] On Behalf Of Andrew Beekhof
> Sent: Tuesday, March 18, 2008 3:25 AM
> To: General Linux-HA mailing list
> Subject: Re: [Linux-HA] trying to understand failback cause - log
> attached
> 
> On Tue, Mar 4, 2008 at 4:13 PM, Damon Estep <[EMAIL PROTECTED]>
> wrote:
> > I have a 2 node cluster, heartbeat version 2.1.3,  crm=yes,
symmetric
> >  cluster=yes, default resource stickiness=50, default resource
> failure
> >  stickiness=0
> >
> >
> >
> >  The desired behavior is that resources stay where they are unless
> there
> >  is an error starting one of them (like v1 auto_failback=off). My
> >  understanding is that a positive default resource stickiness value
> >  should achieve this (with all other scores being equal).
> >
> >
> >
> >  I have attached the ha-log from hb_report in hopes that someone can
> >  identify why the actual behavior is this;
> >
> >
> >
> >  Prior to 3:00AM the master drbd role and resource group were
happily
> >  running on node cn4-inverness-co
> >
> >  At 3:00AM a cron event fired that called #shutdown - r now, this
> event
> >  was intended to test failover to the other node. I did in fact get
a
> >  successful failover.
> >
> >  After cn4-inverenss-co came back up and joined the cluster a
> failback
> >  from cn3 to cn4 took place, which is not desired.
> >
> >
> >
> >  Can anyone provide any clues that might help me understand why this
> >  happened and how to prevent it? I can duplicate it consistently on
3
> >  different 2 node clusters with the same configuration.
> 
> Without seeing your config, I'm guessing that the score in one or more
> of you resource location constraints exceeded the value you set for
> default resource stickiness (ie. 50)
> 
[Damon Estep] 
That is correct, however I could not overcome it, it appears that the
DRBD Master/Slave resource agent has some internal scoring manipulation
that is difficult to understand, and conflicted with the scoring
manipulation I needed to achieve. I had to switch to a V1 style cluster
and turn off autofailback to get predictable results.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
RE: [Linux-HA] trying to understand failback cause - log attached

Reply via email to