On Thu, Aug 23, 2007 at 01:29:38PM -0400, Brian O'Neill wrote: > Dejan Muhamedagic wrote: > >On Thu, Aug 23, 2007 at 04:18:53PM +0200, Christian Rishøj wrote: > >>In the syslog, I see a lot a activity like this: > >> > >>Aug 23 16:00:19 dub lrmd: [14365]: info: G_SIG_dispatch: started at > >>4361099980 should have started at 4361099880 > >>Aug 23 16:02:21 dub lrmd: [14365]: WARN: G_SIG_dispatch: Dispatch > >>function for SIGCHLD was delayed 1000 ms (> 100 ms) before being > >>called (GSource: 0x619158) > >>Aug 23 16:02:21 dub lrmd: [14365]: info: G_SIG_dispatch: started at > >>4361112160 should have started at 4361112060 > >>Aug 23 16:04:22 dub lrmd: [14365]: WARN: G_SIG_dispatch: Dispatch > >>function for SIGCHLD was delayed 1000 ms (> 100 ms) before being > >>called (GSource: 0x619158) > >>Aug 23 16:04:22 dub lrmd: [14365]: info: G_SIG_dispatch: started at > >>4361124338 should have started at 4361124238 > >> > >>Any clues what is going on here? > > > >Yes. See the discussion in > > > >http://old.linux-foundation.org/developer_bugzilla/show_bug.cgi?id=1684 > > > > I can confirm that we are seeing this on a 2.0.7 installation currently > (upgrading not a good option - production cluster).
I'd still suggest upgrading. Many things have been fixed since 2.0.7. lrmd was also thoroughly examined. If you have a spare cluster to test the new release, upgrading is really a breeze---just put the cluster in an unmanaged mode, upgrade, and then put it back to the managed mode. > lrmd has produced > more than 273k of these messages in 7 hours. The cluster itself has been > up since January 31st. > > It has also produced over 100k of the following: > > Aug 23 06:27:37 node0 lrmd: [4583]: WARN: on_repeat_op_readytorun: > Operations list for admin0_ip is suspicously long [69] This is not an error. What it says is that for that resource there are too many operations queued. Where too many is four. That could be a sign that some operations are taking too long to execute, more than the interval defined for the monitor (I guess that it is a monitor operation). But, is that really a group? Because no operations should be defined for a group. > "admin0_ip" is a resource group that includes an IPaddr and a custom > Apache OCF script, and that's it. Don't know if it is related or > something else is amiss - can't find any reference to this error. Yes, this should happen very very seldom. Are there any other warnings? Such as, for example, about operations delayed or max_child_count reached. At any rate, you should investigate this thoroughly. You can also post the configuration and logs. > -Brian > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
