Re: [Linux-HA] lrmd CPU intensive?

Dejan Muhamedagic Thu, 23 Aug 2007 14:43:07 -0700

On Thu, Aug 23, 2007 at 01:29:38PM -0400, Brian O'Neill wrote:
> Dejan Muhamedagic wrote:
> >On Thu, Aug 23, 2007 at 04:18:53PM +0200, Christian Rishøj wrote:
> >>In the syslog, I see a lot a activity like this:
> >>
> >>Aug 23 16:00:19 dub lrmd: [14365]: info: G_SIG_dispatch: started at
> >>4361099980 should have started at 4361099880
> >>Aug 23 16:02:21 dub lrmd: [14365]: WARN: G_SIG_dispatch: Dispatch
> >>function for SIGCHLD was delayed 1000 ms (> 100 ms) before being
> >>called (GSource: 0x619158)
> >>Aug 23 16:02:21 dub lrmd: [14365]: info: G_SIG_dispatch: started at
> >>4361112160 should have started at 4361112060
> >>Aug 23 16:04:22 dub lrmd: [14365]: WARN: G_SIG_dispatch: Dispatch
> >>function for SIGCHLD was delayed 1000 ms (> 100 ms) before being
> >>called (GSource: 0x619158)
> >>Aug 23 16:04:22 dub lrmd: [14365]: info: G_SIG_dispatch: started at
> >>4361124338 should have started at 4361124238
> >>
> >>Any clues what is going on here?
> >
> >Yes. See the discussion in
> >
> >http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1684
> >
> 
> I can confirm that we are seeing this on a 2.0.7 installation currently 
> (upgrading not a good option - production cluster).


I'd still suggest upgrading. Many things have been fixed since
2.0.7. lrmd was also thoroughly examined. If you have a spare
cluster to test the new release, upgrading is really a
breeze---just put the cluster in an unmanaged mode, upgrade, and
then put it back to the managed mode.

> lrmd has produced 
> more than 273k of these messages in 7 hours. The cluster itself has been 
> up since January 31st.
> 
> It has also produced over 100k of the following:
> 
> Aug 23 06:27:37 node0 lrmd: [4583]: WARN: on_repeat_op_readytorun: 
> Operations list for admin0_ip is suspicously long [69]

This is not an error. What it says is that for that resource
there are too many operations queued. Where too many is four.
That could be a sign that some operations are taking too long to
execute, more than the interval defined for the monitor (I guess
that it is a monitor operation). But, is that really a group?
Because no operations should be defined for a group.

> "admin0_ip" is a resource group that includes an IPaddr and a custom 
> Apache OCF script, and that's it. Don't know if it is related or 
> something else is amiss - can't find any reference to this error.

Yes, this should happen very very seldom. Are there any other
warnings? Such as, for example, about operations delayed or
max_child_count reached. At any rate, you should investigate this
thoroughly. You can also post the configuration and logs.

> -Brian
> 
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] lrmd CPU intensive?

Reply via email to