On Thu, Jan 13, 2011 at 10:55 AM, Lars Ellenberg
<lars.ellenb...@linbit.com>wrote:

> On Thu, Jan 13, 2011 at 10:17:40AM -0600, Igor Chudov wrote:
> > Again, after about 3-4 days of running, heartbeat master process dies
> with
> > SIGXCPU.
> >
> > I was fortunate to run strace -p on it, so I captured strace. It looks
> like
> > boring, garden variety regular work, and then heartbeat dies with
> SIGXCPU.
> > The output is a bit lengthy.
> >
> > Is there some way to turn OFF the timeout on CPU?
>
> heartbeat sources,
>  heartbeat/heartbeat.c,
>  look out for cl_cpu_limit_setpercent
> which itself is defined in glue sources,
>  glue/lib/clplumbing/cpulimits.c
>
> There the head comment block explains the intention of it:
>  * This allows us to better catch runaway realtime processes that
>  * might otherwise hang the whole system (if they're POSIX realtime
>  * processes).
>  *
>  * We do this by getting a "lease" on CPU time, and then ....
>
> You could of course simply kill invokations of it.
>
> It would be interesting to know what heartbeat spends its cpu time on,
> though, so maybe you can try to profile it?
>
> It should usually not consume that much cpu.
>
>
Lars, in my uneducated opinion, the bug is in setting CPU limit incorrectly.

I did watch heartbeat a little bit with ps | grep and its CPU use is very
low.

I think that it is just a dumb, garden variety bug.

i
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to