Re: [PATCH v4 6/7] arm: Add ARM 32-bit Kernel Control Flow Integrity implementation

Kees Cook Thu, 02 Oct 2025 16:06:42 -0700

On Mon, Sep 29, 2025 at 11:59:15AM +0200, Ard Biesheuvel wrote:
> On Fri, 26 Sept 2025 at 05:02, Kees Cook <[email protected]> wrote:
> >
> > diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> > index 422ae549b65b..c3b9f16ea872 100644
> > --- a/gcc/config/arm/arm.md
> > +++ b/gcc/config/arm/arm.md
> ...
> > +/* Output the assembly for a KCFI checked call instruction.  INSN is the
> > +   RTL instruction being processed.  OPERANDS is the array of RTL operands
> > +   where operands[0] is the call target register, operands[2] is the KCFI
> > +   type ID constant.  Returns an empty string as all output is handled by
> > +   direct assembly generation.  */
> > +
> > +const char *
> > +arm_output_kcfi_insn (rtx_insn *insn, rtx *operands)
> > +{
> > +  /* KCFI type id.  */
> > +  uint32_t type_id = INTVAL (operands[2]);
> > +
> > +  /* Calculate typeid offset from call target.  */
> > +  HOST_WIDE_INT offset = -kcfi_typeid_offset;
> > +
> > +  /* Generate custom label names.  */
> > +  char trap_name[32];
> > +  char call_name[32];
> > +  ASM_GENERATE_INTERNAL_LABEL (trap_name, "Lkcfi_trap", kcfi_labelno);
> > +  ASM_GENERATE_INTERNAL_LABEL (call_name, "Lkcfi_call", kcfi_labelno);
> > +
> > +  /* Create memory operand for the type load.  */
> > +  rtx mem_op = gen_rtx_MEM (SImode,
> > +                           gen_rtx_PLUS (SImode, operands[0],
> > +                                         GEN_INT (offset)));
> > +  rtx temp_operands[6];
> > +
> > +  /* Normally we can use r12 as our scratch register.  */
> > +  unsigned scratch_reg_num = IP_REGNUM;
> > +  /* If register pressure has made r12 our target register, we need to pick
> > +     a different register.  We don't want to spill our target register
> > +     because on reload at the end of the KCFI check, we'd be producing
> > +     the very kind of call gadget we were trying to protect against:
> > +     "pop %target; call %target".  In this case, use r3 as our scratch
> > +     register.  But since r3 may be used for function arguments, we need
> > +     to check if it is being used for that and only spill/reload if that
> > +     happens.  Any spill/reload of r3 due to making a call will already
> > +     have been managed by the register allocator, so we only have to care
> > +     about not clobbering the argument value it may be carrying into the
> > +     call here.  Also use r3 when r12 is a fixed register.  */
> > +  if (REGNO (operands[0]) == scratch_reg_num
> > +      || fixed_regs[scratch_reg_num])
> > +    scratch_reg_num = LAST_ARG_REGNUM;
> > +  rtx scratch_reg = gen_rtx_REG (SImode, scratch_reg_num);
> > +
> > +  /* We only need to spill r3 if it's actually used by the call.  */
> > +  bool need_spill = (scratch_reg_num == LAST_ARG_REGNUM)
> > +                   && reg_overlap_mentioned_p (scratch_reg, insn);
> > +
> > +  /* Calculate trap immediate.  */
> > +  unsigned addr_reg_num = REGNO (operands[0]);
> > +  /* The scratch register is always clobbered by eor seq: use 0x1F.  */
> > +  unsigned udf_immediate = 0x8000 | (0x1F << 5) | (addr_reg_num & 31);
> > +
> 
> I take it this means you still need to decode the instructions in the
> kernel to obtain the expected type id?


Currently, yes.

> Can't you insert the actual register index here, and defer the reload
> until after the UDF? That way, the scratch register will always
> contain the XOR of the actual vs expected typeids when taking the
> trap.

My instinct is to avoid any kind of load/call gadget (as a ROP target),
even if the controlled register is only the 4th argument. The risk is
much lower, but it seemed like reducing the risk to 0 requires just a
little help on the kernel side after taking the trap (and x86 already
does this reliably).

I suppose as an alternative I could use the index when it's not r3, but
then Linux would need to read the destination memory to rebulid the XOR?
I think that's even more fragile... I think it'd be best to just read
back the prior 5 instructions before the trap. It's reliable. :)

-Kees

-- 
Kees Cook

Re: [PATCH v4 6/7] arm: Add ARM 32-bit Kernel Control Flow Integrity implementation

Reply via email to