Hi Doug,

On Wed, Aug 21, 2024 at 02:53:57PM -0700, Douglas Anderson wrote:
> When testing hard lockup handling on my sc7180-trogdor-lazor device
> with pseudo-NMI enabled, with serial console enabled and with kgdb
> disabled, I found that the stack crawls printed to the serial console
> ended up as a jumbled mess. After rebooting, the pstore-based console
> looked fine though. Also, enabling kgdb to trap the panic made the
> console look fine and avoided the mess.

Just a small nit:

>       while (num_other_online_cpus() && timeout--)
>               udelay(1);
>  
> -     if (num_other_online_cpus())
> +     /*
> +      * If CPUs are still online, try an NMI. There's no excuse for this to
> +      * be slow, so we only give them an extra 10 ms to respond.
> +      */
> +     if (num_other_online_cpus() && ipi_should_be_nmi(IPI_CPU_STOP_NMI)) {

We probably want an smp_rmb() here...

> +             cpumask_copy(&mask, cpu_online_mask);
> +             cpumask_clear_cpu(smp_processor_id(), &mask);
> +
> +             pr_info("SMP: retry stop with NMI for CPUs %*pbl\n",
> +                     cpumask_pr_args(&mask));
> +
> +             smp_cross_call(&mask, IPI_CPU_STOP_NMI);
> +             timeout = USEC_PER_MSEC * 10;
> +             while (num_other_online_cpus() && timeout--)
> +                     udelay(1);
> +     }
> +
> +     if (num_other_online_cpus()) {


... and again here, just to make sure that the re-read of cpu_online_mask
is ordered after the read of __num_online_cpus in num_other_online_cpus().

I can add those when applying.

Will

Reply via email to