On Thu, Sep 11, 2025 at 09:49:56AM +0200, Ard Biesheuvel wrote:
> On Fri, 5 Sept 2025 at 02:24, Kees Cook <[email protected]> wrote:
> >
> > Implement ARM 32-bit KCFI backend supporting ARMv7+:
> >
> > - Function preamble generation using .word directives for type ID storage
> >   at -4 byte offset from function entry point (no prefix NOPs needed due to
> >   4-byte instruction alignment).
> >
> > - Use movw/movt instructions for 32-bit immediate loading.
> >
> > - Trap debugging through UDF instruction immediate encoding following
> >   AArch64 BRK pattern for encoding registers with useful contents.
> >
> > - Scratch register allocation using r0/r1 following ARM procedure call
> >   standard for caller-saved temporary registers, though they get
> >   stack spilled due to register pressure.
> >
> > Assembly Code Pattern for ARM 32-bit:
> >   push {r0, r1}                ; Spill r0, r1
> >   ldr  r0, [target, #-4]       ; Load actual type ID from preamble
> >   movw r1, #type_id_low        ; Load expected type (lower 16 bits)
> >   movt r1, #type_id_high       ; Load upper 16 bits with top instruction
> >   cmp  r0, r1                  ; Compare type IDs directly
> >   pop [r0, r1]                 ; Reload r0, r1
> 
> We could avoid the MOVW/MOVT pair and the spilling by doing something
> along the lines of
> 
> ldr   ip, [target, #-4]
> eor   ip, ip, #type_id[0]
> eor   ip, ip, #type_id[1] << 8
> eor   ip, ip, #type_id[2] << 16
> eors  ip, ip, #type_id[3] << 24
> ldrne ip, =type_id[3:0]

Ah-ha, nice. And it could re-load the type_id on the slow path instead
of unconditionally, I guess? (So no "ne" suffix needed there.)

  ...
  eors  ip, ip, #type_id[3] << 24
  beq .Lkcfi_call
.Lkcfi_trap:
  ldr ip, =type_id[3:0]
  udf #nnn
.Lkcfi_call:
  blx target


> 
> Note that IP (R12) should be dead before a function call. Here it is
> conditionally loaded with the expected target typeid, removing the
> need to decode the instructions to recover it when the trap occurs.
> 
> This should compile to Thumb2 as well as ARM encodings.

Won't IP get used as the target register if r0-r3 are used for passing
arguments? AAPCS implies this is how it'll go (4 arguments in registers,
the rest on stack), but when I tried to force this to happen, it looked
like it'd only pass 3 via registers, and would make the call with r3.

I can't see if this is safe to unconditionally use IP?

-- 
Kees Cook

Reply via email to