On Tue, Feb 27, 2024 at 04:57:31PM -0800, Doug Anderson wrote: > Hi, > > On Mon, Jan 8, 2024 at 4:54 PM Doug Anderson <[email protected]> wrote: > > > > Hi, > > > > On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <[email protected]> > > wrote: > > > > > > When testing hard lockup handling on my sc7180-trogdor-lazor device > > > with pseudo-NMI enabled, with serial console enabled and with kgdb > > > disabled, I found that the stack crawls printed to the serial console > > > ended up as a jumbled mess. After rebooting, the pstore-based console > > > looked fine though. Also, enabling kgdb to trap the panic made the > > > console look fine and avoided the mess. > > > > > > After a bit of tracking down, I came to the conclusion that this was > > > what was happening: > > > 1. The panic path was stopping all other CPUs with > > > panic_other_cpus_shutdown(). > > > 2. At least one of those other CPUs was in the middle of printing to > > > the serial console and holding the console port's lock, which is > > > grabbed with "irqsave". ...but since we were stopping with an NMI > > > we didn't care about the "irqsave" and interrupted anyway. > > > 3. Since we stopped the CPU while it was holding the lock it would > > > never release it. > > > 4. All future calls to output to the console would end up failing to > > > get the lock in qcom_geni_serial_console_write(). This isn't > > > _totally_ unexpected at panic time but it's a code path that's not > > > well tested, hard to get right, and apparently doesn't work > > > terribly well on the Qualcomm geni serial driver. > > > > > > It would probably be a reasonable idea to try to make the Qualcomm > > > geni serial driver work better, but also it's nice not to get into > > > this situation in the first place. > > > > > > Taking a page from what x86 appears to do in native_stop_other_cpus(), > > > let's do this: > > > 1. First, we'll try to stop other CPUs with a normal IPI and wait a > > > second. This gives them a chance to leave critical sections. > > > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much > > > lower timeout since there's no good reason for a CPU not to react > > > quickly to a NMI. > > > > > > This works well and avoids the corrupted console and (presumably) > > > could help avoid other similar issues. > > > > > > In order to do this, we need to do a little re-organization of our > > > IPIs since we don't have any more free IDs. We'll do what was > > > suggested in previous conversations and combine "stop" and "crash > > > stop". That frees up an IPI so now we can have a "stop" and "stop > > > NMI". > > > > > > In order to do this we also need a slight change in the way we keep > > > track of which CPUs still need to be stopped. We need to know > > > specifically which CPUs haven't stopped yet when we fall back to NMI > > > but in the "crash stop" case the "cpu_online_mask" isn't updated as > > > CPUs go down. This is why that code path had an atomic of the number > > > of CPUs left. We'll solve this by making the cpumask into a > > > global. This has a potential memory implication--with NR_CPUs = 4096 > > > this is 4096/8 = 512 bytes of globals. On the upside in that same case > > > we take 512 bytes off the stack which could potentially have made the > > > stop code less reliable. It can be noted that the NMI backtrace code > > > (lib/nmi_backtrace.c) uses the same approach and that use also > > > confirms that updating the mask is safe from NMI. > > > > > > All of the above lets us combine the logic for "stop" and "crash stop" > > > code, which appeared to have a bunch of arbitrary implementation > > > differences. Possibly this could make up for some of the 512 wasted > > > bytes. ;-) > > > > > > Aside from the above change where we try a normal IPI and then an NMI, > > > the combined function has a few subtle differences: > > > * In the normal smp_send_stop(), if we fail to stop one or more CPUs > > > then we won't include the current CPU (the one running > > > smp_send_stop()) in the error message. > > > * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print > > > the CPUs that we failed to stop instead of printing all _but_ the > > > current running CPU. > > > * In crash_smp_send_stop(), we will now only print "SMP: stopping > > > secondary CPUs" if (system_state <= SYSTEM_RUNNING). > > > > > > Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP > > > should try for NMI") > > > Signed-off-by: Douglas Anderson <[email protected]> > > > --- > > > I'm not setup to test the crash_smp_send_stop(). I made sure it > > > compiled and hacked the panic() method to call it, but I haven't > > > actually run kexec. Hopefully others can confirm that it's working for > > > them. > > > > > > arch/arm64/kernel/smp.c | 115 +++++++++++++++++++--------------------- > > > 1 file changed, 54 insertions(+), 61 deletions(-) > > > > The sound of crickets is overwhelming. ;-) Does anyone have any > > comments here? Is this a terrible idea? Is this the best idea you've > > heard all year (it's only been 8 days, so maybe)? Is this great but > > the implementation is lacking (at best)? Do you hate that this waits > > for 1 second and wish it waited for 1 ms? 10 ms? 100 ms? 8192 ms? > > > > Aside from the weirdness of a processor being killed while holding the > > console lock, it does seem beneficial to give IRQs at least a little > > time to finish before killing a processor. I don't have any other > > explicit examples, but I could just imagine that things might be a > > little more orderly in such a case... > > I'm still hoping to get some sort of feedback here. If people think > this is a terrible idea then I'll shut up now and leave well enough > alone, but it would be nice to actively decide and get the patch out > of limbo.
I've read patch through a couple of times and was generally convinced by the "do what x86 does" argument. However until now I've always held my council since I wasn't familiar with these code paths and I figured it was OK for me to have no opinion because the first line of the description says that kgdb/kdb is 100% not involved in causing the problem ;-) . However today I also took a look at the HAVE_NMI architectures and there is no consensus between them about how to implement this: PowerPC uses NMI and most of the others use IRQ only, s390 special cases for the panic code path and acts differently compared to a normal SMP shutdown. FWIW the x86 route was irq-only and then switching to irq-plus-nmi (after a short trial with NMI-only that had problems with pstore reliability[1]) and that approach has been in place for over a decade now! However, if we talking ourselves into copying x86 then perhaps we should more accurately copy x86! Assuming I read the x86 code correctly then crash_smp_send_stop() will (mostly) go staight to NMI rather than trialling an IRQ first! That is not what is currently implemented in the patch for arm64. Daniel. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7d007d21e539dbecb6942c5734e6649f720982cf
