From: Hoeun Ryu <hoeun....@lge.com> Many console device drivers hold the uart_port->lock spinlock with irq disabled (using spin_lock_irqsave()) while the device drivers are writing characters to their devices, but the device drivers just try to hold the spin lock (using spin_trylock_irqsave()) instead if "oops_in_progress" is equal or greater than 1 to avoid deadlocks.
There is a case ocurring a deadlock related to the lock and oops_in_progress. If the kernel lockup detector calls panic() while the device driver is holding the lock, it can cause a deadlock because panic() eventually calls console_unlock() and tries to hold the lock. Here is an example. CPU0 local_irq_save() . foo() bar() . // foo() + bar() takes long time printk() console_unlock() call_console_drivers() // close to watchdog threshold some_slow_console_device_write() // device driver code spin_lock_irqsave(uart->lock) // acquire uart spin lock slow-write() watchdog_overflow_callback() // watchdog expired and call panic() panic() bust_spinlocks(0) // now, oops_in_progress = 0 console_flush_on_panic() console_unlock() call_console_drivers() some_slow_console_device_write() spin_lock_irqsave(uart->lock) ^^^^ deadlock // we can use spin_trylock_irqsave() console_flush_on_panic() is called in panic() and it eventually holds the uart lock but the lock is held by the preempted CPU (the same CPU in NMI context) and it is a deadlock. By moving bust_spinlocks(0) after console_flush_on_panic(), let the console device drivers think the Oops is still in progress to call spin_trylock_irqsave() instead of spin_lock_irqsave() to avoid the deadlock. CPU0 watchdog_overflow_callback() // watchdog expired and call panic() panic() console_flush_on_panic() console_unlock() call_console_drivers() some_slow_console_device_write() spin_trylock_irqsave(uart->lock) // oops_in_progress = 1 ^^^^ use trylock, no deadlock bust_spinlocks(0) // now, oops_in_progress = 0 Signed-off-by: Hoeun Ryu <hoeun....@lge.com> --- v2: fix commit message on the reason of a deadlock, no code change. kernel/panic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/panic.c b/kernel/panic.c index 42e4874..b4063b6 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -233,8 +233,6 @@ void panic(const char *fmt, ...) if (_crash_kexec_post_notifiers) __crash_kexec(NULL); - bust_spinlocks(0); - /* * We may have ended up stopping the CPU holding the lock (in * smp_send_stop()) while still having some valuable data in the console @@ -246,6 +244,8 @@ void panic(const char *fmt, ...) debug_locks_off(); console_flush_on_panic(); + bust_spinlocks(0); + if (!panic_blink) panic_blink = no_blink; -- 2.1.4