On 6/5/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
On 6/5/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> Hello -
>
> I played with pingd in v2 heartbeat and found some problems (or
> inconvenience) there:
>
> My configuration includes a group of resources and a rsc_location rule
> for a primary node. If I configure pingd in the ha.cf and add
> rsc_location rule with score -INFINITY for pingd attribute not define
> or less or equal then 0 everything works like it should. My group
> starts on a primary node and fails over to backup node if primary
> looses its network connection.
>
> Problems start when I move pingd from ha.cf to cib.xml and configure a
> clone for it there. It looks like (I'm not absolutely sure in that)
> that when pingd starts up it doesn't have enough time to update CIB
> before Heartbeat starts other resources.

do you have ordering constraints between the pingd resource and the
other resources?

Putting ordering constraints didn't help. Probably constraints and
timeout in RA would help. I'm going to test it.


> Because of that Heartbeat
> complains that there is no nodes available for resources or that
> resources can't run on any node in the cluster.

presumably because there are no pingd scores yet - thats perfectly normal so far

Absolutely true.


> With the second check
> heartbeat sees nodes available but at this time there is no guarantee
> that resources will be started on a desired primary node.

this bit i'm not sure i understand

Ok, here I tried to explain that after pingd score have been populated
the other rsc_location rule (that defines primary box) gets ignored.
That probably because pingd score for a secondary box get populated a
bit earlier then for a primary.


do you mean the pingd scores haven't stabilized?
or that they're equal and you can't make the resource start on a
particular node?

They stabilized but as I said I can't make a resource to guarantee to
start on the primary node. Some times it starts on primary, sometimes
on a backup node.


>
> I hope that I explained the problem correctly. The possible fix could
> be implementing a short timeout (OCF_RESKEY_dampen + 3s for example)
> in the start function of pingd RA.
>
> There were also some mistakes in the v2/faq/pingd document that I
> corrected in wiki.linux-ha.org

thanks!
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to