Re: kernel BUG at kernel/sched_rt.c:493!

Mike Galbraith Thu, 10 Jan 2013 21:23:13 -0800

On Thu, 2013-01-10 at 13:58 -0600, Shawn Bohrer wrote:

> Here is the output:
> 
> [   81.278842] SysRq : Changing Loglevel
> [   81.279027] Loglevel set to 9
> [   83.285456] Initial want: 50000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 900000000
> [   85.286452] Initial want: 50000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 900000000
> [   85.289625] Initial want: 50000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 900000000
> [   87.287435] Initial want: 100000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 850000000
> [   87.290718] Initial want: 50000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 900000000
> [   89.288469] Initial want: -50000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 1000000000
> [   89.291550] Initial want: 150000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 800000000
> [   89.292940] Initial want: 100000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 850000000
> [   89.294082] Initial want: 100000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 850000000
> [   89.295194] Initial want: 50000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 900000000
> [   89.296274] Initial want: 50000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 900000000
> [   90.959004] [sched_delayed] sched: RT throttling activated
> [   91.289470] Initial want: 200000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 750000000
> [   91.292767] Initial want: 200000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 750000000
> [   91.294037] Initial want: 200000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 750000000
> [   91.295364] Initial want: 200000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 750000000
> [   91.296355] BUG triggered, want: 200000000
> [   91.296355] 
> [   91.296355] rt_rq[7]:
> [   91.296355]   .rt_nr_running                 : 0
> [   91.296355]   .rt_throttled                  : 0
> [   91.296355]   .rt_time                       : 0.000000
> [   91.296355]   .rt_runtime                    : 750.000000
> [   91.307332] Initial want: -50000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 1000000000
> [   91.308440] Initial want: -100000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 1050000000
> [   91.309586] Initial want: -150000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 1100000000
> [   91.310716] Initial want: -200000000 rt_b->rt_runtime: 950000000 
> rt_rq->rt_runtime: 1150000000
> [   91.311707] BUG triggered, want: -200000000
> [   91.311707] 
> [   91.311707] rt_rq[6]:
> [   91.311707]   .rt_nr_running                 : 1
> [   91.311707]   .rt_throttled                  : 0
> [   91.311707]   .rt_time                       : 307.209987
> [   91.311707]   .rt_runtime                    : 1150.000000


That makes about as much sense as my crash did.  There is no leak, but
cpu found nada.  So rd/span is changing on us?  We saw nada as we
traversed, release locks, poof, it's there.  In my dump, at crash time,
rd/span was fine, but at the top of the stack, I found a mighty
suspicious reference to def_root_domain.span.  At _crash_ time, both it
and rq->rd.span had pilfer-able goodies, but whatever it was that we
traversed had nada just a wee bit earlier.

If the darn thing would trigger for me again, I'd use trace_printk(),
check rd and cpumask_weight(rd->span) before/after traverse, print cpu
and rt_runtime as we traverse, set /proc/sys/kernel/ftrace_dump_on_oops
to 1.. and hope bug doesn't like to play heisenbug games.

-Mike 



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: kernel BUG at kernel/sched_rt.c:493!

Reply via email to