With pseudo NMIs support available its possible to configure SGIs to be triggered as pseudo NMIs running in NMI context. And kernel features such as: - NMI backtrace can leverage IPI turned as NMI to get a backtrace of CPU stuck in hard lockup using magic SYSRQ. - kgdb relies on NMI support to round up CPUs which are stuck in hard lockup state with interrupts disabled.
This patch-set adds framework to turn an IPI as NMI which can be triggered as a pseudo NMI which in turn invokes registered NMI handlers. After this patch-set we should be able to get a backtrace for a CPU stuck in HARDLOCKUP. Have a look at an examples below from a hard lockup testcase run on Developerbox: $ echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT NMI backtrace: ============== # Issue Magic SysRq to dump backtrace [ 376.894502] NMI backtrace for cpu 8 [ 376.894506] CPU: 8 PID: 555 Comm: bash Not tainted 5.9.0-rc3-00740-g06ff047-dirty #242 [ 376.894510] Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #73 Apr 6 2020 [ 376.894514] pstate: 40000005 (nZcv daif -PAN -UAO BTYPE=--) [ 376.894517] pc : lkdtm_HARDLOCKUP+0x8/0x18 [ 376.894520] lr : lkdtm_do_action+0x24/0x30 [ 376.894524] sp : ffff800012cebd20 [ 376.894527] pmr_save: 00000060 [ 376.894530] x29: ffff800012cebd20 x28: ffff000875ae8000 [ 376.894540] x27: 0000000000000000 x26: 0000000000000000 [ 376.894550] x25: 000000000000001a x24: ffff800012cebe40 [ 376.894560] x23: 000000000000000b x22: ffff800010fc5040 [ 376.894569] x21: ffff000878b61000 x20: ffff8000113b2870 [ 376.894579] x19: 000000000000001b x18: 0000000000000010 [ 376.894588] x17: 0000000000000000 x16: 0000000000000000 [ 376.894598] x15: ffff000875ae8470 x14: 00000000000002ad [ 376.894613] x13: 0000000000000000 x12: 0000000000000000 [ 376.894622] x11: 0000000000000007 x10: 00000000000009c0 [ 376.894631] x9 : ffff800012ceba80 x8 : ffff000875ae8a20 [ 376.894641] x7 : ffff00087f6b3280 x6 : ffff00087f6b3200 [ 376.894651] x5 : 0000000000000000 x4 : ffff00087f6a91f8 [ 376.894660] x3 : ffff00087f6b0120 x2 : 1aa310cec69eb500 [ 376.894670] x1 : 0000000000000000 x0 : 0000000000000060 [ 376.894679] Call trace: [ 376.894683] lkdtm_HARDLOCKUP+0x8/0x18 [ 376.894686] direct_entry+0x124/0x1c0 [ 376.894689] full_proxy_write+0x60/0xb0 [ 376.894693] vfs_write+0xf0/0x230 [ 376.894696] ksys_write+0x6c/0xf8 [ 376.894699] __arm64_sys_write+0x1c/0x28 [ 376.894703] el0_svc_common.constprop.0+0x74/0x1f0 [ 376.894707] do_el0_svc+0x24/0x90 [ 376.894710] el0_sync_handler+0x180/0x2f8 [ 376.894713] el0_sync+0x158/0x180 KGDB: ===== # Enter kdb via Magic SysRq [6]kdb> btc btc: cpu status: Currently on cpu 6 Available cpus: 0-5(I), 6, 7(I), 8, 9-23(I) <snip> Stack traceback for pid 555 0xffff000875ae8000 555 554 1 8 R 0xffff000875ae89c0 bash CPU: 8 PID: 555 Comm: bash Not tainted 5.9.0-rc3-00740-g06ff047-dirty #242 Hardware name: Socionext SynQuacer E-series DeveloperBox, BIOS build #73 Apr 6 2020 Call trace: dump_backtrace+0x0/0x1a0 show_stack+0x18/0x28 dump_stack+0xc0/0x11c kgdb_cpu_enter+0x648/0x660 kgdb_nmicallback+0xa0/0xa8 ipi_kgdb_nmicallback+0x24/0x30 ipi_nmi_handler+0x48/0x60 handle_percpu_devid_fasteoi_ipi+0x74/0x88 generic_handle_irq+0x30/0x48 handle_domain_nmi+0x48/0x80 gic_handle_irq+0x18c/0x34c el1_irq+0xcc/0x180 lkdtm_HARDLOCKUP+0x8/0x18 direct_entry+0x124/0x1c0 full_proxy_write+0x60/0xb0 vfs_write+0xf0/0x230 ksys_write+0x6c/0xf8 __arm64_sys_write+0x1c/0x28 el0_svc_common.constprop.0+0x74/0x1f0 do_el0_svc+0x24/0x90 el0_sync_handler+0x180/0x2f8 el0_sync+0x158/0x180 <snip> Changes in v5: - Rebased to head of upstream master. - Remove redundant invocation of ipi_nmi_setup(). - Addressed misc. comments. Changes in v4: - Move IPI NMI framework to a separate file. - Get rid of hard-coded IPI_CALL_NMI_FUNC allocation. - Add NMI backtrace support leveraged via magic SYSRQ. Changes in v3: - Rebased to Marc's latest IPIs patch-set [1]. [1] https://lkml.org/lkml/2020/9/1/603 Changes since RFC version [1]: - Switch to use generic interrupt framework to turn an IPI as NMI. - Dependent on Marc's patch-set [2] which turns IPIs into normal interrupts. - Addressed misc. comments from Doug on patch #4. - Posted kgdb NMI printk() fixup separately which has evolved since to be solved using different approach via changing kgdb interception of printk() in common printk() code (see patch [3]). [1] https://lkml.org/lkml/2020/4/24/328 [2] https://lkml.org/lkml/2020/5/19/710 [3] https://lkml.org/lkml/2020/5/20/418 Sumit Garg (5): arm64: Add framework to turn IPI as NMI irqchip/gic-v3: Enable support for SGIs to act as NMIs arm64: smp: Allocate and setup IPI as NMI arm64: kgdb: Round up cpus using IPI as NMI arm64: ipi_nmi: Add support for NMI backtrace arch/arm64/include/asm/irq.h | 6 +++ arch/arm64/include/asm/kgdb.h | 8 ++++ arch/arm64/include/asm/nmi.h | 16 ++++++++ arch/arm64/kernel/Makefile | 2 +- arch/arm64/kernel/ipi_nmi.c | 90 +++++++++++++++++++++++++++++++++++++++++++ arch/arm64/kernel/kgdb.c | 21 ++++++++++ arch/arm64/kernel/smp.c | 8 ++++ drivers/irqchip/irq-gic-v3.c | 13 ++++++- 8 files changed, 161 insertions(+), 3 deletions(-) create mode 100644 arch/arm64/include/asm/nmi.h create mode 100644 arch/arm64/kernel/ipi_nmi.c -- 2.7.4 _______________________________________________ Kgdb-bugreport mailing list Kgdb-bugreport@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport