Hello,

Does anyone reading this list know much about the rcu subsystem?

I have been debugging a problem with unmounting disks. Occasionally when unmounting an ext4 filesystem, the whole system would freeze.
I traced this to it waiting for completion on an rcu_barrier.

After lots of debugging, I found that the problem was that when scheduling the rcu barrier callback on each cpu (_rcu_barrier in kernel/rcutree.c), one of the cpus had just entered a cpu_idle loop, waiting on a timer with a max timeout. The on_each_cpu call uses IPI calls to schedule the callback on each cpu. This exits the pm_idle call, the IPI interrupt is handled, and the callback is called. It schedules the barrier callback on this cpu (see __call_rcu in kernel/rcutree.c), but does not kick off the rcu core to start handling the callback because interrupts are disabled (we are in an interrupt handler, so interrupts are correctly disabled). It then exits the interrupt handler for the IPI, but nothing has set the idle thread as needing a resched, so it stays within the inner loop of cpu_idle and waits for the massive timer to expire.

It looks to me that something either needs to wake up the idle cpu when an rcu callback is scheduled on it (I couldnt figure out how to do that), or it should not be scheduled on a completely idle cpu as this cpu is already in a quiescent state.

A fix that I made was to break out of the inner loop of cpu_idle if (!need_resched() && !rcu_pending(smp_processor_id()). This allows the IPI call which scheduled the rcu callback to break out of the inner loop when the interrupt handler is exited because the newly queued rcu callback has caused rcu_pending to be true.

Can anyone comment on whether this is in fact a bug, and if so, is this a reasonable fix ( I suspect that there will be a more elegant solution, but I dont have time to discover it)?

(I am running on a quad core A9 CPU, with CONFIG_PREEMPT_NONE and CONFIG_NO_HZ).

Thanks

Bob

--
unsubscribe: [email protected]
website: http://groups.google.com/group/android-kernel

Reply via email to