On Wed, Aug 08, 2007 at 01:07:19PM +0200, Andreas Kurz wrote: > Hello all, > > I am running a two-node test cluster (heartbeat 2.1.2) using pingd as > an OCF resource and encountered the following behaviour in my > configuration: > > - I disabled clusterwide resource monitoring to restart heartbeat on > on one node, because lrmd was not working as expected
What was it doing? > - "/etc/init.d/hearbeat stop" hanged infinitely so I killed all > heartbeatprocesses and the second node stonithed the other as > expected, the resources were not started on the second node because > they were unmanaged There must have been a reason for that. Logs and the CIB should provide more details. > - when the first node was up again and integrated again in the cluster > I reenabled clusterwide resource monitoring What do you mean by "clusterwide resource monitoring"? > - now the resources were all started on the second node, whith its > higher weight because of the already running pingd and its > score_attributes If pingd was running on all nodes then the resources should have moved to their prefered node. > Now my question is: Is it possible to configure heartbeat to always > wait for all pingd clone-instances to be started before the > calculation of the scores for other resources (where a constraint with > a pingd score_attribute exists) ? This is an interesting question: if I got it right, you are talking about the delay between pingd being started and updating the attributes. Since it is not possible to establish how much it would take for the program (in this case pingd) to obtain data necessary to update the attributes it wouldn't make sense to wait for the update. However, once the CIB changes through that update, the CRM will recalculate scores and move resources if appropriate. > The only idea I had was to start pingd from ha.cf or to stop pingd > also on the second node before reenabling the resource monitoring to > allow a "clean" resource placing. But why didn't pingd run on the other first node? Shouldn't it run if the node is eligible to run the resources? Isn't that the point of it after all, to establish that the node is connected? Cheers, Dejan > Regards, > Andreas > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
