[android-kernel] rcu_barrier stall

Robert Beckett Fri, 03 Feb 2012 11:48:37 -0800

Hello,

Does anyone reading this list know much about the rcu subsystem?

I have been debugging a problem with unmounting disks. Occasionally whenunmounting an ext4 filesystem, the whole system would freeze.

I traced this to it waiting for completion on an rcu_barrier.

After lots of debugging, I found that the problem was that whenscheduling the rcu barrier callback on each cpu (_rcu_barrier inkernel/rcutree.c), one of the cpus had just entered a cpu_idle loop,waiting on a timer with a max timeout.The on_each_cpu call uses IPI calls to schedule the callback on eachcpu. This exits the pm_idle call, the IPI interrupt is handled, and thecallback is called. It schedules the barrier callback on this cpu (see__call_rcu in kernel/rcutree.c), but does not kick off the rcu core tostart handling the callback because interrupts are disabled (we are inan interrupt handler, so interrupts are correctly disabled). It thenexits the interrupt handler for the IPI, but nothing has set the idlethread as needing a resched, so it stays within the inner loop ofcpu_idle and waits for the massive timer to expire.

It looks to me that something either needs to wake up the idle cpu whenan rcu callback is scheduled on it (I couldnt figure out how to dothat), or it should not be scheduled on a completely idle cpu as thiscpu is already in a quiescent state.

A fix that I made was to break out of the inner loop of cpu_idle if(!need_resched() && !rcu_pending(smp_processor_id()). This allows theIPI call which scheduled the rcu callback to break out of the inner loopwhen the interrupt handler is exited because the newly queued rcucallback has caused rcu_pending to be true.

Can anyone comment on whether this is in fact a bug, and if so, is thisa reasonable fix ( I suspect that there will be a more elegant solution,but I dont have time to discover it)?

(I am running on a quad core A9 CPU, with CONFIG_PREEMPT_NONE andCONFIG_NO_HZ).


Thanks

Bob

--
unsubscribe: [email protected]
website: http://groups.google.com/group/android-kernel

[android-kernel] rcu_barrier stall

Reply via email to