Ok. rsc_order from a resource to pingd and timeout in the pingd RA
fixed the problem. Here is a pacth for pingd.in if you want to apply
it:

--- resources/OCF/pingd.in.distr        2007-06-05 09:38:31.000000000 -0600
+++ resources/OCF/pingd.in      2007-06-05 09:39:16.000000000 -0600
@@ -161,6 +161,8 @@

    rc=$?
    if [ $rc = 0 ]; then
+        #Give it some time to populate scores.
+        sleep `expr ${OCF_RESKEY_dampen%%s} + 5`
       exit $OCF_SUCCESS
    fi


On 6/5/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
On 6/5/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> On 6/5/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> > Hello -
> >
> > I played with pingd in v2 heartbeat and found some problems (or
> > inconvenience) there:
> >
> > My configuration includes a group of resources and a rsc_location rule
> > for a primary node. If I configure pingd in the ha.cf and add
> > rsc_location rule with score -INFINITY for pingd attribute not define
> > or less or equal then 0 everything works like it should. My group
> > starts on a primary node and fails over to backup node if primary
> > looses its network connection.
> >
> > Problems start when I move pingd from ha.cf to cib.xml and configure a
> > clone for it there. It looks like (I'm not absolutely sure in that)
> > that when pingd starts up it doesn't have enough time to update CIB
> > before Heartbeat starts other resources.
>
> do you have ordering constraints between the pingd resource and the
> other resources?

Putting ordering constraints didn't help. Probably constraints and
timeout in RA would help. I'm going to test it.

>
> > Because of that Heartbeat
> > complains that there is no nodes available for resources or that
> > resources can't run on any node in the cluster.
>
> presumably because there are no pingd scores yet - thats perfectly normal so 
far

Absolutely true.

>
> > With the second check
> > heartbeat sees nodes available but at this time there is no guarantee
> > that resources will be started on a desired primary node.
>
> this bit i'm not sure i understand

Ok, here I tried to explain that after pingd score have been populated
the other rsc_location rule (that defines primary box) gets ignored.
That probably because pingd score for a secondary box get populated a
bit earlier then for a primary.

>
> do you mean the pingd scores haven't stabilized?
> or that they're equal and you can't make the resource start on a
> particular node?

They stabilized but as I said I can't make a resource to guarantee to
start on the primary node. Some times it starts on primary, sometimes
on a backup node.

>
> >
> > I hope that I explained the problem correctly. The possible fix could
> > be implementing a short timeout (OCF_RESKEY_dampen + 3s for example)
> > in the start function of pingd RA.
> >
> > There were also some mistakes in the v2/faq/pingd document that I
> > corrected in wiki.linux-ha.org
>
> thanks!
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to