Excerpts from Sachin Sant's message of December 15, 2021 8:49 pm: > >> Reported-by: Sachin Sant <sach...@linux.vnet.ibm.com> >> Signed-off-by: Nicholas Piggin <npig...@gmail.com> >> --- >> I'm not entirely sure about the 32-bit / compat part. Or the 64-bit for >> that matter except that it does seem to fix the bug caused by the test >> program. >> >> Thanks, >> Nick >> >> arch/powerpc/kernel/signal_32.c | 23 ++++++++++++++++------- >> arch/powerpc/kernel/signal_64.c | 17 ++++++++++++----- >> 2 files changed, 28 insertions(+), 12 deletions(-) > > Sorry for the delayed feedback. I was observing confusing test results > so had to be sure. > > Test results are against 5.16.0-rc5-03218-g798527287598 (powerpc/merge) > > I ran some extended set of kernel self tests (from powerpc/signal and > powerpc/security) on POWER8, POWER9 and POWER10 configs. > > On POWER8 & POWER10 LPAR with this fix tests ran successfully. > > on POWER9 PowerNV with the fix and following set of configs > > CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y > CONFIG_PPC_RFI_SRR_DEBUG=y > > the tests ran successfully. > > But with the fix and following set of configs > > CONFIG_PPC_IRQ_SOFT_MASK_DEBUG=y > CONFIG_PPC_RFI_SRR_DEBUG=n > > I still a warning and then a crash: > > [ 550.568588] count-cache-flush: hardware flush enabled. > [ 550.568593] link-stack-flush: software flush enabled. > [ 550.569604] ------------[ cut here ]------------ > [ 550.569611] WARNING: CPU: 21 PID: 3784 at arch/powerpc/kernel/irq.c:288 > arch_local_irq_restore+0x22c/0x230 > [ 550.569625] Modules linked in: nft_ct nf_conntrack nf_defrag_ipv6 > nf_defrag_ipv4 ip_set nf_tables rfkill libcrc32c nfnetlink i2c_dev rpcrdma > sunrpc ib_umad rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod > ib_ipoib ib_iser rdma_cm iw_cm libiscsi ib_cm scsi_transport_iscsi mlx5_ib > ib_uverbs dm_mod ib_core ses enclosure tpm_i2c_nuvoton i2c_opal ipmi_powernv > xts ipmi_devintf uio_pdrv_genirq vmx_crypto ipmi_msghandler i2c_core opal_prd > uio ibmpowernv leds_powernv powernv_op_panel sch_fq_codel ip_tables ext4 > mbcache jbd2 mlx5_core sd_mod t10_pi sg mpt3sas ipr tg3 libata mlxfw psample > raid_class ptp scsi_transport_sas pps_core fuse > [ 550.569752] CPU: 21 PID: 3784 Comm: NetworkManager Kdump: loaded Not > tainted 5.16.0-rc5-03218-g798527287598 #8 > [ 550.569765] NIP: c0000000000171dc LR: c000000000033240 CTR: > c000000000cf1260 > [ 550.569774] REGS: c000000ae08bbab0 TRAP: 0700 Not tainted > (5.16.0-rc5-03218-g798527287598) > [ 550.569784] MSR: 9000000000021031 <SF,HV,ME,IR,DR,LE> CR: 28004444 XER: > 20040000 > [ 550.569802] CFAR: c00000000001704c IRQMASK: 1 > [ 550.569802] GPR00: c000000000033240 c000000ae08bbd50 c000000002a10600 > 0000000000000000 > [ 550.569802] GPR04: c000000ae08bbe80 00007fffaea1ece0 0000000000000000 > 0000000000000000 > [ 550.569802] GPR08: 0000000000000002 0000000000000000 0000000000000002 > 0000000001f3fb0c > [ 550.569802] GPR12: 0000000000004000 c000005fff7c9080 0000000000000000 > 0000000000000000 > [ 550.569802] GPR16: 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > [ 550.569802] GPR20: 0000000000000000 0000000000000000 0000000000000000 > 0000000000000000 > [ 550.569802] GPR24: 0000000000000002 0000000000000001 0000000002002000 > 0000000002802000 > [ 550.569802] GPR28: 0000000000000000 0000000000000800 c000000ae08bbe80 > 0000000000040080 > [ 550.569899] NIP [c0000000000171dc] arch_local_irq_restore+0x22c/0x230 > [ 550.569909] LR [c000000000033240] > interrupt_exit_user_prepare_main+0x150/0x260 > [ 550.569919] Call Trace: > [ 550.569925] [c000000ae08bbd80] [c000000000033240] > interrupt_exit_user_prepare_main+0x150/0x260 > [ 550.569937] [c000000ae08bbde0] [c000000000033744] > syscall_exit_prepare+0x74/0x150 > [ 550.569948] [c000000ae08bbe10] [c00000000000c758] > system_call_common+0xf8/0x268
Yeah this looks like a different issue. Is there a test running which flips the security mitigations rapidly? There is a race window with the the static branch causing exit_must_hard_disable() returning two different values. We should update they key while single threaded AFAIKS. Thanks, Nick --- diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 57c6bb802f6c..a7cb317e7039 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -232,11 +232,22 @@ static DEFINE_MUTEX(exit_flush_lock); static int __do_stf_barrier_fixups(void *data) { - enum stf_barrier_type *types = data; + enum stf_barrier_type types = *(enum stf_barrier_type *)data; do_stf_entry_barrier_fixups(*types); do_stf_exit_barrier_fixups(*types); + if ((types & STF_BARRIER_FALLBACK) || (types & STF_BARRIER_SYNC_ORI)) + stf_exit_reentrant = false; + else + stf_exit_reentrant = true; + + if (stf_exit_reentrant && rfi_exit_reentrant) + static_branch_disable(&interrupt_exit_not_reentrant); + else + static_branch_enable(&interrupt_exit_not_reentrant); + + return 0; } @@ -257,18 +268,9 @@ void do_stf_barrier_fixups(enum stf_barrier_type types) // Prevent static key update races with do_rfi_flush_fixups() mutex_lock(&exit_flush_lock); - static_branch_enable(&interrupt_exit_not_reentrant); stop_machine(__do_stf_barrier_fixups, &types, NULL); - if ((types & STF_BARRIER_FALLBACK) || (types & STF_BARRIER_SYNC_ORI)) - stf_exit_reentrant = false; - else - stf_exit_reentrant = true; - - if (stf_exit_reentrant && rfi_exit_reentrant) - static_branch_disable(&interrupt_exit_not_reentrant); - mutex_unlock(&exit_flush_lock); } @@ -472,6 +474,16 @@ static int __do_rfi_flush_fixups(void *data) patch_instruction(dest + 2, ppc_inst(instrs[2])); } + if (types & L1D_FLUSH_FALLBACK) + rfi_exit_reentrant = false; + else + rfi_exit_reentrant = true; + + if (stf_exit_reentrant && rfi_exit_reentrant) + static_branch_disable(&interrupt_exit_not_reentrant); + else + static_branch_enable(&interrupt_exit_not_reentrant); + printk(KERN_DEBUG "rfi-flush: patched %d locations (%s flush)\n", i, (types == L1D_FLUSH_NONE) ? "no" : (types == L1D_FLUSH_FALLBACK) ? "fallback displacement" : @@ -495,18 +507,9 @@ void do_rfi_flush_fixups(enum l1d_flush_type types) // Prevent static key update races with do_stf_barrier_fixups() mutex_lock(&exit_flush_lock); - static_branch_enable(&interrupt_exit_not_reentrant); stop_machine(__do_rfi_flush_fixups, &types, NULL); - if (types & L1D_FLUSH_FALLBACK) - rfi_exit_reentrant = false; - else - rfi_exit_reentrant = true; - - if (stf_exit_reentrant && rfi_exit_reentrant) - static_branch_disable(&interrupt_exit_not_reentrant); - mutex_unlock(&exit_flush_lock); }