A deadlock caused by logbuf_lock occurs when panic:

        a) Panic CPU is running in non-NMI context
        b) Panic CPU sends out shutdown IPI via NMI vector
        c) One of the CPUs that we bring down via NMI vector holded logbuf_lock
        d) Panic CPU try to hold logbuf_lock, then deadlock occurs.

we try to re-init the logbuf_lock in printk_safe_flush_on_panic()
to avoid deadlock, but it does not work here, because :

Firstly, it is inappropriate to check num_online_cpus() here.
When the CPU bring down via NMI vector, the panic CPU willn't
wait too long for other cores to stop, so when this problem
occurs, num_online_cpus() may be greater than 1.

Secondly, printk_safe_flush_on_panic() is called after panic
notifier callback, so if printk() is called in panic notifier
callback, deadlock will still occurs. Eg, if ftrace_dump_on_oops
is set, we print some debug information, it will try to hold the
logbuf_lock.

To avoid this deadlock, drop the num_online_cpus() check and call
the printk_safe_flush_on_panic() before panic_notifier_list callback,
attempt to re-init logbuf_lock from panic CPU.

Signed-off-by: Cheng Jian <cj.chengj...@huawei.com>
---
 kernel/panic.c              | 3 +++
 kernel/printk/printk_safe.c | 3 ---
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index b69ee9e76cb2..8dbcb2227b60 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -255,6 +255,9 @@ void panic(const char *fmt, ...)
                crash_smp_send_stop();
        }
 
+       /* Call flush even twice. It tries harder with a single online CPU */
+       printk_safe_flush_on_panic();
+
        /*
         * Run any panic handlers, including those that might need to
         * add information to the kmsg dump output.
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index d9a659a686f3..9ebc1723e1a4 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -269,9 +269,6 @@ void printk_safe_flush_on_panic(void)
         * Do not risk a double release when more CPUs are up.
         */
        if (raw_spin_is_locked(&logbuf_lock)) {
-               if (num_online_cpus() > 1)
-                       return;
-
                debug_locks_off();
                raw_spin_lock_init(&logbuf_lock);
        }
-- 
2.17.1

Reply via email to