Logs and cib.xml are attached. As you can see per configuration
TestGroup has to start on a node called goodman (and it does if I
configure pingd in ha.cf), but it starts on miller.

On 6/6/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
On 6/5/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> On 6/5/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> > On 6/5/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> > > Hello -
> > >
> > > I played with pingd in v2 heartbeat and found some problems (or
> > > inconvenience) there:
> > >
> > > My configuration includes a group of resources and a rsc_location rule
> > > for a primary node. If I configure pingd in the ha.cf and add
> > > rsc_location rule with score -INFINITY for pingd attribute not define
> > > or less or equal then 0 everything works like it should. My group
> > > starts on a primary node and fails over to backup node if primary
> > > looses its network connection.
> > >
> > > Problems start when I move pingd from ha.cf to cib.xml and configure a
> > > clone for it there. It looks like (I'm not absolutely sure in that)
> > > that when pingd starts up it doesn't have enough time to update CIB
> > > before Heartbeat starts other resources.
> >
> > do you have ordering constraints between the pingd resource and the
> > other resources?
>
> Putting ordering constraints didn't help. Probably constraints and
> timeout in RA would help. I'm going to test it.
>
> >
> > > Because of that Heartbeat
> > > complains that there is no nodes available for resources or that
> > > resources can't run on any node in the cluster.
> >
> > presumably because there are no pingd scores yet - thats perfectly normal 
so far
>
> Absolutely true.
>
> >
> > > With the second check
> > > heartbeat sees nodes available but at this time there is no guarantee
> > > that resources will be started on a desired primary node.
> >
> > this bit i'm not sure i understand
>
> Ok, here I tried to explain that after pingd score have been populated
> the other rsc_location rule (that defines primary box) gets ignored.

ignored?  no way.

> That probably because pingd score for a secondary box get populated a
> bit earlier then for a primary.

do you have logs showing this?  any delay should be extremely negligible.

> >
> > do you mean the pingd scores haven't stabilized?
> > or that they're equal and you can't make the resource start on a
> > particular node?
>
> They stabilized but as I said I can't make a resource to guarantee to
> start on the primary node. Some times it starts on primary, sometimes
> on a backup node.
>
> >
> > >
> > > I hope that I explained the problem correctly. The possible fix could
> > > be implementing a short timeout (OCF_RESKEY_dampen + 3s for example)
> > > in the start function of pingd RA.
> > >
> > > There were also some mistakes in the v2/faq/pingd document that I
> > > corrected in wiki.linux-ha.org
> >
> > thanks!
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to