Dejan Muhamedagic wrote:
On Thu, Aug 23, 2007 at 01:29:38PM -0400, Brian O'Neill wrote:
I can confirm that we are seeing this on a 2.0.7 installation currently
(upgrading not a good option - production cluster).
I'd still suggest upgrading. Many things have been fixed since
2.0.7. lrmd was also thoroughly examined. If you have a spare
cluster to test the new release, upgrading is really a
breeze---just put the cluster in an unmanaged mode, upgrade, and
then put it back to the managed mode.
I'll bring it up with the customer who's services are on it. We are
actually hoping that they may eliminate the need for floating an IP
around and go fully loadbalanced soon anyway, but that may be a ways off.
lrmd has produced
more than 273k of these messages in 7 hours. The cluster itself has been
up since January 31st.
It has also produced over 100k of the following:
Aug 23 06:27:37 node0 lrmd: [4583]: WARN: on_repeat_op_readytorun:
Operations list for admin0_ip is suspicously long [69]
This is not an error. What it says is that for that resource
there are too many operations queued. Where too many is four.
That could be a sign that some operations are taking too long to
execute, more than the interval defined for the monitor (I guess
that it is a monitor operation). But, is that really a group?
Because no operations should be defined for a group.
You are correct, it is one resource in the group...it is just an IPaddr
resource.
Yes, this should happen very very seldom. Are there any other
warnings? Such as, for example, about operations delayed or
max_child_count reached. At any rate, you should investigate this
thoroughly. You can also post the configuration and logs.
Yes, I found these as well:
Aug 19 06:31:10 node0 lrmd: [4583]: WARN: perform_ra_op: the operation
operation monitor[353] on ocf::IPaddr::admin0_ip for client 4586, its
parameters: CRM_meta_interval=[15000] ip=[10.16.8.30]
CRM_meta_op_target_rc=[7] netmask=[27] CRM_meta_id=[admin0_ip_mon_id]
CRM_meta_timeout=[3000] crm_feature_set=[1.0.6] CRM_meta_nam stayed in
operation list for 6420 ms (longer than 5000 ms)
It appears to be logging this extremely frequently as well - not sure
why I didn't notice it. This is a simple IPaddr resource - nothing fancy
about it.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems