Sorry for the delayed response. On (07/24/19 14:27), Petr Mladek wrote: [..] > > And this is where the idea of "disconnecting" those CPUs from main > > logbuf come from. > > > > So what we can do: > > - smp_send_stop() > > - disconnect all-but-self from logbuf (via printk-safe) > > printk_safe is not really necessary. As you wrote, nobody > is interested into messages from CPUs that are supposed > to be stopped.
OK. Doing it in printk.c makes it easier to handle console_owner/waiter, which are not exported. > It might be enough to set some global variable, for example, > with the CPU number that is calling panic() and is the only > one allowed to print messages from this point on. Sounds good. > It might even be used to force console lock owner to leave > the cycle immediately. Yes, makes sense. [..] > > So, shall we try one more time with the "disconnect" misbehaving CPUs > > approach? I can send an RFC patch. > > IMHO, it will be acceptable only when it is reasonably simple and > straightforward. The panic() code is full of special hacks and > it is already complicated to follow all the twists. When you have a chance, mind to take a look at the patch below? Doesn't look very difficult (half of it are white-spaces and comments, I believe). > Especially because this scenario came up from a theoretical > discussion. Note that my original, real bug report, was > with logbuf_lock NMI, enabled kdump notifiers, ... I understand. The idea is that this patch should handle your real scenario + theoretical scenario. --- include/linux/printk.h | 2 ++ kernel/panic.c | 9 ++++++- kernel/printk/printk.c | 53 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 63 insertions(+), 1 deletion(-) diff --git a/include/linux/printk.h b/include/linux/printk.h index 57c9473f4a81..016f5ba06e94 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -322,6 +322,8 @@ static inline void printk_safe_flush_on_panic(void) } #endif +void printk_enter_panic_mode(int cpu); + extern int kptr_restrict; #ifndef pr_fmt diff --git a/kernel/panic.c b/kernel/panic.c index d1ece4c363b9..85fac975a90f 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -254,13 +254,20 @@ void panic(const char *fmt, ...) crash_smp_send_stop(); } + /* Misbehaving secondary CPUs cannot printk() to the main logbuf now */ + printk_enter_panic_mode(this_cpu); + /* * Run any panic handlers, including those that might need to * add information to the kmsg dump output. */ atomic_notifier_call_chain(&panic_notifier_list, 0, buf); - /* Call flush even twice. It tries harder with a single online CPU */ + /* + * Call flush even twice. It tries harder with a single online CPU. + * Even if we failed to stop some of secondary CPUs we have printk + * locks re-initialized and keep secondary CPUs off printk(). + */ printk_safe_flush_on_panic(); kmsg_dump(KMSG_DUMP_PANIC); diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index f0bc37a511a7..750f83c3b589 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -87,6 +87,8 @@ static DEFINE_SEMAPHORE(console_sem); struct console *console_drivers; EXPORT_SYMBOL_GPL(console_drivers); +static atomic_t __read_mostly printk_panic_cpu = ATOMIC_INIT(PANIC_CPU_INVALID); + /* * System may need to suppress printk message under certain * circumstances, like after kernel panic happens. @@ -1644,6 +1646,49 @@ static void console_lock_spinning_enable(void) spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_); } +static int panic_in_progress_on_other_cpu(void) +{ + int cpu = atomic_read(&printk_panic_cpu); + + return cpu != PANIC_CPU_INVALID && cpu != smp_processor_id(); +} + +void printk_enter_panic_mode(int cpu) +{ + unsigned long timeout; + + cpu = atomic_cmpxchg(&printk_panic_cpu, PANIC_CPU_INVALID, cpu); + /* printk can enter panic mode only once */ + if (cpu != PANIC_CPU_INVALID) + return; + /* + * Wait for active secondary CPUs (if there are any) to leave + * console_unlock() printing loop (for up to one second). + */ + if (num_online_cpus() > 1) { + timeout = USEC_PER_SEC; + while (num_online_cpus() > 1 && timeout--) + udelay(1); + } + + debug_locks_off(); + /* + * On some platforms crash_smp_send_stop() can kill CPUs via NMI + * vector. Re-init printk() locks just in case if any of those killed + * CPUs held any of printk() locks. On platforms which don't support + * NMI stop, misbehaving secondary CPUs will be handled by + * panic_in_progress_on_other_cpu() test. + * + * We re-init only printk() locks here. oops_in_progress is expected + * to be set by now, so good console drivers are in lockless mode, + * bad console drivers, however, can deadlock. + */ + raw_spin_lock_init(&logbuf_lock); + sema_init(&console_sem, 1); + WRITE_ONCE(console_waiter, false); + raw_spin_lock_init(&console_owner_lock); +} + /** * console_lock_spinning_disable_and_check - mark end of code where another * thread was able to busy wait and check if there is a waiter @@ -1900,6 +1945,9 @@ int vprintk_store(int facility, int level, size_t text_len; enum log_flags lflags = 0; + if (panic_in_progress_on_other_cpu()) + return 0; + /* * The printf needs to come first; we need the syslog * prefix which might be passed-in as a parameter. @@ -2468,6 +2516,11 @@ void console_unlock(void) return; } + if (panic_in_progress_on_other_cpu()) { + printk_safe_exit_irqrestore(flags); + return; + } + printk_safe_exit_irqrestore(flags); if (do_cond_resched) -- 2.22.0