On Thu, Jan 13, 2011 at 10:55 AM, Lars Ellenberg <lars.ellenb...@linbit.com>wrote:
> On Thu, Jan 13, 2011 at 10:17:40AM -0600, Igor Chudov wrote: > > Again, after about 3-4 days of running, heartbeat master process dies > with > > SIGXCPU. > > > > I was fortunate to run strace -p on it, so I captured strace. It looks > like > > boring, garden variety regular work, and then heartbeat dies with > SIGXCPU. > > The output is a bit lengthy. > > > > Is there some way to turn OFF the timeout on CPU? > > heartbeat sources, > heartbeat/heartbeat.c, > look out for cl_cpu_limit_setpercent > which itself is defined in glue sources, > glue/lib/clplumbing/cpulimits.c > > There the head comment block explains the intention of it: > * This allows us to better catch runaway realtime processes that > * might otherwise hang the whole system (if they're POSIX realtime > * processes). > * > * We do this by getting a "lease" on CPU time, and then .... > > You could of course simply kill invokations of it. > > It would be interesting to know what heartbeat spends its cpu time on, > though, so maybe you can try to profile it? > > It should usually not consume that much cpu. > > Lars, in my uneducated opinion, the bug is in setting CPU limit incorrectly. I did watch heartbeat a little bit with ps | grep and its CPU use is very low. I think that it is just a dumb, garden variety bug. i _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems