On 8/8/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> On Wed, Aug 08, 2007 at 01:07:19PM +0200, Andreas Kurz wrote:
> > Hello all,
> >
> > I am running a two-node test cluster (heartbeat 2.1.2) using pingd as
> > an OCF resource and encountered the following behaviour in my
> > configuration:
> >
> > - I disabled clusterwide resource monitoring to restart heartbeat on
> > on one node, because lrmd was not working as expected
>
> What was it doing?

I know what it was not doing ... executing monitors and monitor action
initiated by the DC (after an 'crm_resource -P') timed out.

> > - "/etc/init.d/hearbeat stop" hanged infinitely so I killed all
> > heartbeatprocesses and the second node stonithed the other as
> > expected, the resources were not started on the second node because
> > they were unmanaged
>
> There must have been a reason for that. Logs and the CIB should
> provide more details.

I will start an extra thread for this and the lrmd respawn problem.

>
> > - when the first node was up again and integrated again in the cluster
> > I reenabled clusterwide resource monitoring
>
> What do you mean by "clusterwide resource monitoring"?

setting 'is-managed-default' in the crm_config section

>
> > - now the resources were all started on the second node, whith its
> > higher weight because of the already running pingd and its
> > score_attributes
>
> If pingd was running on all nodes then the resources should have
> moved to their prefered node.

It was only running on the second node, because the first node was stonithed.

> > Now my question is: Is it possible to configure heartbeat to always
> > wait for all pingd clone-instances to be started before the
> > calculation of the scores for other resources (where a constraint with
> > a pingd score_attribute exists) ?
>
> This is an interesting question: if I got it right, you are
> talking about the delay between pingd being started and updating
> the attributes. Since it is not possible to establish how much
> it would take for the program (in this case pingd) to obtain data
> necessary to update the attributes it wouldn't make sense to wait
> for the update. However, once the CIB changes through that
> update, the CRM will recalculate scores and move resources if
> appropriate.

I am talking about the behaviour of the CRM when 'is-managed-default'
is reenabled again in a cluster and pingd is not running everywhere
and there are some nodes which have a pingd node attribute and some
which have not. As the pingd resource is a resource that influences
the score of nodes when placing other resources I think it would be
nice to have pingd started on all nodes _before_ all other resources
.... when there are constraints including pingd score_attributes.

>
> > The only idea I had was to start pingd from ha.cf or to stop pingd
> > also on the second node before reenabling the resource monitoring to
> > allow a "clean" resource placing.
>
> But why didn't pingd run on the other first node? Shouldn't it
> run if the node is eligible to run the resources? Isn't that the
> point of it after all, to establish that the node is connected?

Yes of course ...  but as described above the first node was stonithed
and with 'is-managed-default'  disabled no resource and so no pingd
was started after the reboot.

To sum it up, my question is: If the pingd is used to influence the
score of nodes in case of resource placement decisions and the CRM
encounters that there are nodes who have no pingd attribute (not 0 but
undefined) wouldnt it be a nice feature to start or restart the pingd
on those nodes (that are involved in the placement decisions) without
pingd node attributes?

Regards,
Andreas

Attachment: pe-input-8.bz2
Description: BZip2 compressed data

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to