On Mon, Sep 29, 2025 at 11:59:15AM +0200, Ard Biesheuvel wrote: > On Fri, 26 Sept 2025 at 05:02, Kees Cook <[email protected]> wrote: > > > > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md > > index 422ae549b65b..c3b9f16ea872 100644 > > --- a/gcc/config/arm/arm.md > > +++ b/gcc/config/arm/arm.md > ... > > +/* Output the assembly for a KCFI checked call instruction. INSN is the > > + RTL instruction being processed. OPERANDS is the array of RTL operands > > + where operands[0] is the call target register, operands[2] is the KCFI > > + type ID constant. Returns an empty string as all output is handled by > > + direct assembly generation. */ > > + > > +const char * > > +arm_output_kcfi_insn (rtx_insn *insn, rtx *operands) > > +{ > > + /* KCFI type id. */ > > + uint32_t type_id = INTVAL (operands[2]); > > + > > + /* Calculate typeid offset from call target. */ > > + HOST_WIDE_INT offset = -kcfi_typeid_offset; > > + > > + /* Generate custom label names. */ > > + char trap_name[32]; > > + char call_name[32]; > > + ASM_GENERATE_INTERNAL_LABEL (trap_name, "Lkcfi_trap", kcfi_labelno); > > + ASM_GENERATE_INTERNAL_LABEL (call_name, "Lkcfi_call", kcfi_labelno); > > + > > + /* Create memory operand for the type load. */ > > + rtx mem_op = gen_rtx_MEM (SImode, > > + gen_rtx_PLUS (SImode, operands[0], > > + GEN_INT (offset))); > > + rtx temp_operands[6]; > > + > > + /* Normally we can use r12 as our scratch register. */ > > + unsigned scratch_reg_num = IP_REGNUM; > > + /* If register pressure has made r12 our target register, we need to pick > > + a different register. We don't want to spill our target register > > + because on reload at the end of the KCFI check, we'd be producing > > + the very kind of call gadget we were trying to protect against: > > + "pop %target; call %target". In this case, use r3 as our scratch > > + register. But since r3 may be used for function arguments, we need > > + to check if it is being used for that and only spill/reload if that > > + happens. Any spill/reload of r3 due to making a call will already > > + have been managed by the register allocator, so we only have to care > > + about not clobbering the argument value it may be carrying into the > > + call here. Also use r3 when r12 is a fixed register. */ > > + if (REGNO (operands[0]) == scratch_reg_num > > + || fixed_regs[scratch_reg_num]) > > + scratch_reg_num = LAST_ARG_REGNUM; > > + rtx scratch_reg = gen_rtx_REG (SImode, scratch_reg_num); > > + > > + /* We only need to spill r3 if it's actually used by the call. */ > > + bool need_spill = (scratch_reg_num == LAST_ARG_REGNUM) > > + && reg_overlap_mentioned_p (scratch_reg, insn); > > + > > + /* Calculate trap immediate. */ > > + unsigned addr_reg_num = REGNO (operands[0]); > > + /* The scratch register is always clobbered by eor seq: use 0x1F. */ > > + unsigned udf_immediate = 0x8000 | (0x1F << 5) | (addr_reg_num & 31); > > + > > I take it this means you still need to decode the instructions in the > kernel to obtain the expected type id?
Currently, yes. > Can't you insert the actual register index here, and defer the reload > until after the UDF? That way, the scratch register will always > contain the XOR of the actual vs expected typeids when taking the > trap. My instinct is to avoid any kind of load/call gadget (as a ROP target), even if the controlled register is only the 4th argument. The risk is much lower, but it seemed like reducing the risk to 0 requires just a little help on the kernel side after taking the trap (and x86 already does this reliably). I suppose as an alternative I could use the index when it's not r3, but then Linux would need to read the destination memory to rebulid the XOR? I think that's even more fragile... I think it'd be best to just read back the prior 5 instructions before the trap. It's reliable. :) -Kees -- Kees Cook
