On 6/5/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
Ok. rsc_order from a resource to pingd and timeout in the pingd RA
fixed the problem. Here is a pacth for pingd.in if you want to apply
it:

it assumes "dampen" is always in seconds


--- resources/OCF/pingd.in.distr        2007-06-05 09:38:31.000000000 -0600
+++ resources/OCF/pingd.in      2007-06-05 09:39:16.000000000 -0600
@@ -161,6 +161,8 @@

     rc=$?
     if [ $rc = 0 ]; then
+        #Give it some time to populate scores.
+        sleep `expr ${OCF_RESKEY_dampen%%s} + 5`
        exit $OCF_SUCCESS
     fi


On 6/5/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> On 6/5/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> > On 6/5/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> > > Hello -
> > >
> > > I played with pingd in v2 heartbeat and found some problems (or
> > > inconvenience) there:
> > >
> > > My configuration includes a group of resources and a rsc_location rule
> > > for a primary node. If I configure pingd in the ha.cf and add
> > > rsc_location rule with score -INFINITY for pingd attribute not define
> > > or less or equal then 0 everything works like it should. My group
> > > starts on a primary node and fails over to backup node if primary
> > > looses its network connection.
> > >
> > > Problems start when I move pingd from ha.cf to cib.xml and configure a
> > > clone for it there. It looks like (I'm not absolutely sure in that)
> > > that when pingd starts up it doesn't have enough time to update CIB
> > > before Heartbeat starts other resources.
> >
> > do you have ordering constraints between the pingd resource and the
> > other resources?
>
> Putting ordering constraints didn't help. Probably constraints and
> timeout in RA would help. I'm going to test it.
>
> >
> > > Because of that Heartbeat
> > > complains that there is no nodes available for resources or that
> > > resources can't run on any node in the cluster.
> >
> > presumably because there are no pingd scores yet - thats perfectly normal 
so far
>
> Absolutely true.
>
> >
> > > With the second check
> > > heartbeat sees nodes available but at this time there is no guarantee
> > > that resources will be started on a desired primary node.
> >
> > this bit i'm not sure i understand
>
> Ok, here I tried to explain that after pingd score have been populated
> the other rsc_location rule (that defines primary box) gets ignored.
> That probably because pingd score for a secondary box get populated a
> bit earlier then for a primary.
>
> >
> > do you mean the pingd scores haven't stabilized?
> > or that they're equal and you can't make the resource start on a
> > particular node?
>
> They stabilized but as I said I can't make a resource to guarantee to
> start on the primary node. Some times it starts on primary, sometimes
> on a backup node.
>
> >
> > >
> > > I hope that I explained the problem correctly. The possible fix could
> > > be implementing a short timeout (OCF_RESKEY_dampen + 3s for example)
> > > in the start function of pingd RA.
> > >
> > > There were also some mistakes in the v2/faq/pingd document that I
> > > corrected in wiki.linux-ha.org
> >
> > thanks!
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to