Hi, On Mon, Jun 24, 2024 at 6:55 AM Will Deacon <[email protected]> wrote: > > On Fri, May 17, 2024 at 01:01:58PM -0700, Doug Anderson wrote: > > On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <[email protected]> > > wrote: > > > local_irq_disable(); > > > > The above local_irq_disable() is not new for my patch but it seems > > wonky for two reasons: > > > > 1. It feels like it should have been the first thing in the function. > > > > 2. It feels like it should be local_daif_mask() instead. > > Is that to ensure we don't take a pNMI? I think that makes sense, but > let's please add a comment to say why local_irq_disable() is not > sufficient.
Right, that was my thought. Mostly I realized it was right because the normal (non-crash) stop case calls local_cpu_stop() which calls local_daif_mask(). I was comparing the two and trying to figure out if the difference was on purpose or an oversight. Looks like an oversight to me. Sure, I'll add a comment. Ironically, looking at the code again I found _yet another_ corner case I missed: panic_smp_self_stop(). If a CPU hits that case then we could end up waiting for it when it's already stopped itself. I tried to figure out how to solve that properly and it dawned on me that maybe I should rethink part of my patch. Specifically, I had added a new `stop_mask` in this patch because the panic case didn't update `cpu_online_mask`. ...but that's easy enough to fix: just add a call to `set_cpu_online(cpu, false)` in ipi_cpu_crash_stop(). ...so I'll do that and avoid adding a new mask. If there's some reason why crash stop shouldn't be marking a CPU offline then let me know and I'll go back... -Doug
