On Fri, Sep 12, 2025 at 11:43:00AM +0200, Ard Biesheuvel wrote:
> On Fri, 12 Sept 2025 at 11:08, Kees Cook <k...@kernel.org> wrote:
> >
> > On Fri, Sep 12, 2025 at 02:03:08AM -0700, Kees Cook wrote:
> > > On Thu, Sep 11, 2025 at 09:49:56AM +0200, Ard Biesheuvel wrote:
> > > > On Fri, 5 Sept 2025 at 02:24, Kees Cook <k...@kernel.org> wrote:
> > > > >
> > > > > Implement ARM 32-bit KCFI backend supporting ARMv7+:
> > > > >
> > > > > - Function preamble generation using .word directives for type ID 
> > > > > storage
> > > > >   at -4 byte offset from function entry point (no prefix NOPs needed 
> > > > > due to
> > > > >   4-byte instruction alignment).
> > > > >
> > > > > - Use movw/movt instructions for 32-bit immediate loading.
> > > > >
> > > > > - Trap debugging through UDF instruction immediate encoding following
> > > > >   AArch64 BRK pattern for encoding registers with useful contents.
> > > > >
> > > > > - Scratch register allocation using r0/r1 following ARM procedure call
> > > > >   standard for caller-saved temporary registers, though they get
> > > > >   stack spilled due to register pressure.
> > > > >
> > > > > Assembly Code Pattern for ARM 32-bit:
> > > > >   push {r0, r1}                ; Spill r0, r1
> > > > >   ldr  r0, [target, #-4]       ; Load actual type ID from preamble
> > > > >   movw r1, #type_id_low        ; Load expected type (lower 16 bits)
> > > > >   movt r1, #type_id_high       ; Load upper 16 bits with top 
> > > > > instruction
> > > > >   cmp  r0, r1                  ; Compare type IDs directly
> > > > >   pop [r0, r1]                 ; Reload r0, r1
> > > >
> > > > We could avoid the MOVW/MOVT pair and the spilling by doing something
> > > > along the lines of
> > > >
> > > > ldr   ip, [target, #-4]
> > > > eor   ip, ip, #type_id[0]
> > > > eor   ip, ip, #type_id[1] << 8
> > > > eor   ip, ip, #type_id[2] << 16
> > > > eors  ip, ip, #type_id[3] << 24
> > > > ldrne ip, =type_id[3:0]
> > >
> > > Ah-ha, nice. And it could re-load the type_id on the slow path instead
> > > of unconditionally, I guess? (So no "ne" suffix needed there.)
> > >
> > >   ...
> > >   eors  ip, ip, #type_id[3] << 24
> > >   beq .Lkcfi_call
> > > .Lkcfi_trap:
> > >   ldr ip, =type_id[3:0]
> 
> Yeah better. If you use the right compiler abstraction to emit this
> load, it will be turned into MOVW/MOVT if the target supports it.
> 
> > >   udf #nnn
> > > .Lkcfi_call:
> > >   blx target
> > >
> > >
> > > >
> > > > Note that IP (R12) should be dead before a function call. Here it is
> > > > conditionally loaded with the expected target typeid, removing the
> > > > need to decode the instructions to recover it when the trap occurs.
> > > >
> > > > This should compile to Thumb2 as well as ARM encodings.
> > >
> > > Won't IP get used as the target register if r0-r3 are used for passing
> > > arguments? AAPCS implies this is how it'll go (4 arguments in registers,
> > > the rest on stack), but when I tried to force this to happen, it looked
> > > like it'd only pass 3 via registers, and would make the call with r3.
> >
> > Wait, I misread, my test is using r4 as the target! Still, is IP guaranteed
> > to never be used for the target?
> >
> 
> The target register can be any GPR. IP is guaranteed by AAPCS not to
> play a role in parameter passing, because it is the Inter Procedural
> scratch register, and may be clobbered by PLT trampolines that get
> inserted between a direct call and its target. These are not direct
> calls, of course, but the callee does not know that, and so it cannot
> make any assumptions about the value of IP.

Okay, it seems like I am close to having this replaced with the eor
method, but the backend really really does not like constructing the ldr
for me. I may leave this as a "future improvement", and just change the
Linux side of the trap handling to decode the eor insns instead of
pulling the value out of IP.

-- 
Kees Cook

Reply via email to