Hi Will, On Fri, Aug 23, 2024 at 3:46 AM Will Deacon <[email protected]> wrote: > > Hi Doug, > > On Wed, Aug 21, 2024 at 02:53:57PM -0700, Douglas Anderson wrote: > > When testing hard lockup handling on my sc7180-trogdor-lazor device > > with pseudo-NMI enabled, with serial console enabled and with kgdb > > disabled, I found that the stack crawls printed to the serial console > > ended up as a jumbled mess. After rebooting, the pstore-based console > > looked fine though. Also, enabling kgdb to trap the panic made the > > console look fine and avoided the mess. > > Just a small nit: > > > while (num_other_online_cpus() && timeout--) > > udelay(1); > > > > - if (num_other_online_cpus()) > > + /* > > + * If CPUs are still online, try an NMI. There's no excuse for this to > > + * be slow, so we only give them an extra 10 ms to respond. > > + */ > > + if (num_other_online_cpus() && ipi_should_be_nmi(IPI_CPU_STOP_NMI)) { > > We probably want an smp_rmb() here... > > > + cpumask_copy(&mask, cpu_online_mask); > > + cpumask_clear_cpu(smp_processor_id(), &mask); > > + > > + pr_info("SMP: retry stop with NMI for CPUs %*pbl\n", > > + cpumask_pr_args(&mask)); > > + > > + smp_cross_call(&mask, IPI_CPU_STOP_NMI); > > + timeout = USEC_PER_MSEC * 10; > > + while (num_other_online_cpus() && timeout--) > > + udelay(1); > > + } > > + > > + if (num_other_online_cpus()) { > > > ... and again here, just to make sure that the re-read of cpu_online_mask > is ordered after the read of __num_online_cpus in num_other_online_cpus(). > > I can add those when applying.
Sounds like a plan to me. Thanks! -Doug
