On Fri, Sep 12, 2025 at 11:43:00AM +0200, Ard Biesheuvel wrote: > On Fri, 12 Sept 2025 at 11:08, Kees Cook <k...@kernel.org> wrote: > > > > On Fri, Sep 12, 2025 at 02:03:08AM -0700, Kees Cook wrote: > > > On Thu, Sep 11, 2025 at 09:49:56AM +0200, Ard Biesheuvel wrote: > > > > On Fri, 5 Sept 2025 at 02:24, Kees Cook <k...@kernel.org> wrote: > > > > > > > > > > Implement ARM 32-bit KCFI backend supporting ARMv7+: > > > > > > > > > > - Function preamble generation using .word directives for type ID > > > > > storage > > > > > at -4 byte offset from function entry point (no prefix NOPs needed > > > > > due to > > > > > 4-byte instruction alignment). > > > > > > > > > > - Use movw/movt instructions for 32-bit immediate loading. > > > > > > > > > > - Trap debugging through UDF instruction immediate encoding following > > > > > AArch64 BRK pattern for encoding registers with useful contents. > > > > > > > > > > - Scratch register allocation using r0/r1 following ARM procedure call > > > > > standard for caller-saved temporary registers, though they get > > > > > stack spilled due to register pressure. > > > > > > > > > > Assembly Code Pattern for ARM 32-bit: > > > > > push {r0, r1} ; Spill r0, r1 > > > > > ldr r0, [target, #-4] ; Load actual type ID from preamble > > > > > movw r1, #type_id_low ; Load expected type (lower 16 bits) > > > > > movt r1, #type_id_high ; Load upper 16 bits with top > > > > > instruction > > > > > cmp r0, r1 ; Compare type IDs directly > > > > > pop [r0, r1] ; Reload r0, r1 > > > > > > > > We could avoid the MOVW/MOVT pair and the spilling by doing something > > > > along the lines of > > > > > > > > ldr ip, [target, #-4] > > > > eor ip, ip, #type_id[0] > > > > eor ip, ip, #type_id[1] << 8 > > > > eor ip, ip, #type_id[2] << 16 > > > > eors ip, ip, #type_id[3] << 24 > > > > ldrne ip, =type_id[3:0] > > > > > > Ah-ha, nice. And it could re-load the type_id on the slow path instead > > > of unconditionally, I guess? (So no "ne" suffix needed there.) > > > > > > ... > > > eors ip, ip, #type_id[3] << 24 > > > beq .Lkcfi_call > > > .Lkcfi_trap: > > > ldr ip, =type_id[3:0] > > Yeah better. If you use the right compiler abstraction to emit this > load, it will be turned into MOVW/MOVT if the target supports it. > > > > udf #nnn > > > .Lkcfi_call: > > > blx target > > > > > > > > > > > > > > Note that IP (R12) should be dead before a function call. Here it is > > > > conditionally loaded with the expected target typeid, removing the > > > > need to decode the instructions to recover it when the trap occurs. > > > > > > > > This should compile to Thumb2 as well as ARM encodings. > > > > > > Won't IP get used as the target register if r0-r3 are used for passing > > > arguments? AAPCS implies this is how it'll go (4 arguments in registers, > > > the rest on stack), but when I tried to force this to happen, it looked > > > like it'd only pass 3 via registers, and would make the call with r3. > > > > Wait, I misread, my test is using r4 as the target! Still, is IP guaranteed > > to never be used for the target? > > > > The target register can be any GPR. IP is guaranteed by AAPCS not to > play a role in parameter passing, because it is the Inter Procedural > scratch register, and may be clobbered by PLT trampolines that get > inserted between a direct call and its target. These are not direct > calls, of course, but the callee does not know that, and so it cannot > make any assumptions about the value of IP.
Okay, it seems like I am close to having this replaced with the eor method, but the backend really really does not like constructing the ldr for me. I may leave this as a "future improvement", and just change the Linux side of the trap handling to decode the eor insns instead of pulling the value out of IP. -- Kees Cook