We extensively researched the problem.

 

The TLB flush softlockup is only a CONSEQUENCE of a deadlock.

 

Background: The TLB flush is issued by a CPU to  a number of other CPUs
using inter-processor interupts to progagate paging changes. Then the
issuing CPU loops until all processor acknowledge the change. If such
processor is in deadlock on a spinlock, this never hapens, then the
softlockup triggers. The deadlock arise on a spinlock, this lock may be held
by user code sometimes (through /proc or /sys interfaces of modules).

 

The only way to identify the root cause (i.e. which driver is causing
problems) is to dump ALL CPU stacks in the soft lockup code.

 

One way to do that is to modifiy the kernel and add 

                arch_trigger_all_cpu_backtrace() 

in the 

                kernel/softlockup.c:softlockup_tick() 

function.

 

This is based on NMI IPI which ensure all stacks are dump, even in the case
of deadlock (well don't expect the impossible to happen either).

 

You should easily find the faulty driver and post the relevant bug.

 

Hope this helps

 

François-Frédéric

Reply via email to