>-----Original Message----- >From: Thomas Gleixner [mailto:[email protected]] >Sent: Friday, October 26, 2012 5:55 PM >To: He, Bo >Cc: [email protected]; Peter Zijlstra; Ingo Molnar; >[email protected]; Zhang, Yanmin >Subject: Re: [PATCH] hrtimer:__run_hrtimer races with enqueue_hrtimer > >On Fri, 26 Oct 2012, he, bo wrote: >> From: Yanmin Zhang <[email protected]> >> >> We hit a kernel panic at __run_hrtimer=>BUG_ON(timer->state != >HRTIMER_STATE_CALLBACK). >> <2>[ 10.226053, 3] kernel BUG at >/home/android/xiaobing/ymz/r4/hardware/intel/linux-2.6/kernel/hrtimer.c:1228 >! >> >> Basically, __run_hrtimer has a race with enqueue_hrtimer. When >> __run_hrtimer calls the timer callback fn, another thread might call >> enqueue_hrtimer or hrtimer_start to requeue it, and the timer->state >> is equal to HRTIMER_STATE_CALLBACK|HRTIMER_STATE_ENQUEUED, which >> causes the BUG_ON(timer->state != HRTIMER_STATE_CALLBACK) checking >> fails. >> >> The patch fixes it by checking only bit HRTIMER_STATE_CALLBACK. > >This does not fix it. It makes it worse. Thanks for your kind comments. We found that and sent V2 at https://lkml.org/lkml/2012/10/26/172.
> >> Signed-off-by: Yanmin Zhang <[email protected]> >> Reviewed-by: He, Bo <[email protected]> >> --- >> kernel/hrtimer.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c >> index 6db7a5e..6280184 100644 >> --- a/kernel/hrtimer.c >> +++ b/kernel/hrtimer.c >> @@ -1235,7 +1235,7 @@ static void __run_hrtimer(struct hrtimer *timer, >ktime_t *now) >> * hrtimer_start_range_ns() or in hrtimer_interrupt() >> */ >> if (restart != HRTIMER_NORESTART) { >> - BUG_ON(timer->state != HRTIMER_STATE_CALLBACK); >> + BUG_ON(!(timer->state & HRTIMER_STATE_CALLBACK)); >> enqueue_hrtimer(timer, base); >> } > >What you are allowing here is enqueueing an already enqueued timer >again. I don't know why this does not explode elsewhere, but that's >probably pure luck. It's not allowed to double enqueue a timer. You are right. > >So no, this is not a solution. The problem is not in the core timer >code, the problem is in the code which uses that timer. > >Your code is returning HRTIMER_RESTART from the timer callback and at >the same time it starts the timer from some other context. That's what >needs to be fixed. The timer user should fix it. But could we also change hrtimer to make it more stable? At least, instead of panic, could we print some information and go ahead to let kernel continue? Thanks, Yanmin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

