Indeed, my crash dump occurred on kernel 2.6.30 (RH 6.5 distribution) and in this version function remove_hrtimer doesn't preserve "HRTIMER_STATE_CALLBACK" flag and triggers hard lockup. So this bug was already fixed in version 2.6.35 to set the flag by remove_hrtimer function.
Seems that migrate_hrtimer_list case isn't a problem because it is called when old CPU is already dead. I'll fix the bug report to resolved. Thanks Itzcak Pechtalt -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Linus Torvalds Sent: Tuesday, September 02, 2014 7:09 PM To: Itzcak Pechtalt; Thomas Gleixner Cc: [email protected] Subject: Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt <[email protected]> wrote: > > I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this > subject with full description. > There is also a short fix patch for kernel/hrtimer.c file. > Even if this bug occurs rary, however it resolves system hard lockup option. The patch is whitespace-damaged, but with a small oneliner like this that doesn't much matter (the timer files moved to kernel/time/ during this merge window, so the patch wouldn't apply as-is anyway). It needs a sign-off (see Documentation/SubmittingPatches), but even more importantly it needs to go to the right people for double-checking. But the patch is more broken than whitespace and even lack of sign-off. It cannot even have compiled. I'm assuming "timer_state" was intended to be "timer->state". Also, every caller but one already has "HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in "newstate", so I suspect if this patch is the real fix (which I'll leave for Thomas to comment more on), afaik the actual problem can only happen through migrate_hrtimer_list() which uconditionally sets the whole state to HRTIMER_STATE_MIGRATE. Thomas? Leaving damaged patch quoted below. Linus > I suspect that it was targeted by mistake to not active list > ([email protected]). > Following is the fix patch based on kernel 3.16.1 (just simple): > diff -uNr a/kernel/hrtimer.c b/kernel/hrtimer.c > --- a/kernel/hrtimer.c 2014-08-31 20:59:52.177452123 +0300 > +++ b/kernel/hrtimer.c 2014-08-31 21:02:14.972166540 +0300 > @@ -941,7 +941,7 @@ > if (!timerqueue_getnext(&base->active)) > base->cpu_base->active_bases &= ~(1 << base->index); > out: > - timer->state = newstate; > + timer->state = (newstate | (timer_state & HRTIMER_STATE_CALLBACK)); > } > > /* > > Is there a chance for this patch fix to insert into next kernel release? > > Thanks > > Itzcak Pechtalt > N�����r��y����b�X��ǧv�^�){.n�+����{����zX����ܨ}���Ơz�&j:+v�������zZ+��+zf���h���~����i���z��w���?�����&�)ߢf��^jǫy�m��@A�a��� 0��h���i

