Logs and cib.xml are attached. As you can see per configuration TestGroup has to start on a node called goodman (and it does if I configure pingd in ha.cf), but it starts on miller.
On 6/6/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
On 6/5/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote: > On 6/5/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > On 6/5/07, Serge Dubrouski <[EMAIL PROTECTED]> wrote: > > > Hello - > > > > > > I played with pingd in v2 heartbeat and found some problems (or > > > inconvenience) there: > > > > > > My configuration includes a group of resources and a rsc_location rule > > > for a primary node. If I configure pingd in the ha.cf and add > > > rsc_location rule with score -INFINITY for pingd attribute not define > > > or less or equal then 0 everything works like it should. My group > > > starts on a primary node and fails over to backup node if primary > > > looses its network connection. > > > > > > Problems start when I move pingd from ha.cf to cib.xml and configure a > > > clone for it there. It looks like (I'm not absolutely sure in that) > > > that when pingd starts up it doesn't have enough time to update CIB > > > before Heartbeat starts other resources. > > > > do you have ordering constraints between the pingd resource and the > > other resources? > > Putting ordering constraints didn't help. Probably constraints and > timeout in RA would help. I'm going to test it. > > > > > > Because of that Heartbeat > > > complains that there is no nodes available for resources or that > > > resources can't run on any node in the cluster. > > > > presumably because there are no pingd scores yet - thats perfectly normal so far > > Absolutely true. > > > > > > With the second check > > > heartbeat sees nodes available but at this time there is no guarantee > > > that resources will be started on a desired primary node. > > > > this bit i'm not sure i understand > > Ok, here I tried to explain that after pingd score have been populated > the other rsc_location rule (that defines primary box) gets ignored. ignored? no way. > That probably because pingd score for a secondary box get populated a > bit earlier then for a primary. do you have logs showing this? any delay should be extremely negligible. > > > > do you mean the pingd scores haven't stabilized? > > or that they're equal and you can't make the resource start on a > > particular node? > > They stabilized but as I said I can't make a resource to guarantee to > start on the primary node. Some times it starts on primary, sometimes > on a backup node. > > > > > > > > > I hope that I explained the problem correctly. The possible fix could > > > be implementing a short timeout (OCF_RESKEY_dampen + 3s for example) > > > in the start function of pingd RA. > > > > > > There were also some mistakes in the v2/faq/pingd document that I > > > corrected in wiki.linux-ha.org > > > > thanks! > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
_______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
