The commit 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when
logbuf_lock is available") tried to detect when logbuf_lock was taken
on another CPU. Then it looked safe to wait for the lock even in NMI.

It would be safe if other locks were not involved. Ironically the same
commit introduced an ABBA deadlock scenario. It added a spin lock into
nmi_cpu_backtrace() to serialize logs from different CPUs. The effect
is that also the NMI handlers are serialized. As a result, logbuf_lock
might be blocked by NMI on another CPU:

CPU0                    CPU1                    CPU2

printk()
  vprintk_emit()
    spin_lock(&logbuf_lock)

                                                trigger_all_cpu_backtrace()
                                                  raise()

                        nmi_enter()
                          printk_nmi_enter()
                            if (this_cpu_read(printk_context)
                              & PRINTK_SAFE_CONTEXT_MASK)
                              // false
                            else
                              // looks safe to use printk_deferred()
                              this_cpu_or(printk_context,
                                PRINTK_NMI_DEFERRED_CONTEXT_MASK);

                          nmi_cpu_backtrace()
                            arch_spin_lock(&lock);
                              show_regs()

nmi_enter()
  nmi_cpu_backtrace()
    arch_spin_lock(&lock);

                              printk()
                                vprintk_func()
                                  vprintk_deferred()
                                    vprintk_emit()
                                      spin_lock(&logbuf_lock)

DEADLOCK: between &logbuf_lock from vprintk_emit() and
                  &lock from nmi_cpu_backtrace().

CPU0                    CPU1
lock(logbuf_lock)       lock(lock)
  lock(lock)              lock(logbuf_lock)

I have found this problem when stress testing trigger_all_cpu_backtrace()
and the system frozen.

Note that lockdep is not able to detect these dependencies because
there is no support for NMI context. Let's stay on the safe side
and always use printk_safe buffers when logbuf_lock is taken
when entering NMI.

Fixes: 719f6a7040f1bdaf96fcc ("printk: Use the main logbuf in NMI when 
logbuf_lock is available")
Cc: 4.13+ <sta...@vger.kernel.org> # v4.13+
Signed-off-by: Petr Mladek <pmla...@suse.com>
---
 kernel/printk/printk_safe.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 449d67edfa4b..a2ebd749c053 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -310,15 +310,12 @@ void printk_nmi_enter(void)
 {
        /*
         * The size of the extra per-CPU buffer is limited. Use it only when
-        * the main one is locked. If this CPU is not in the safe context,
-        * the lock must be taken on another CPU and we could wait for it.
+        * the main one is locked.
         */
-       if ((this_cpu_read(printk_context) & PRINTK_SAFE_CONTEXT_MASK) &&
-           raw_spin_is_locked(&logbuf_lock)) {
+       if (raw_spin_is_locked(&logbuf_lock))
                this_cpu_or(printk_context, PRINTK_NMI_CONTEXT_MASK);
-       } else {
+       else
                this_cpu_or(printk_context, PRINTK_NMI_DEFERRED_CONTEXT_MASK);
-       }
 }
 
 void printk_nmi_exit(void)
-- 
2.13.6

Reply via email to