On 17/12/2021 15:52, Andrea Corallo wrote:
Hi Richard,
thanks for reviewing! Some comments inline.
Richard Earnshaw <richard.earns...@foss.arm.com> writes:
On 05/11/2021 08:52, Andrea Corallo via Gcc-patches wrote:
Hi all,
this patch enables address return signature and verification based
on
Armv8.1-M Pointer Authentication [1].
To sign the return address, we use the PAC R12, LR, SP instruction
upon function entry. This is signing LR using SP and storing the
result in R12. R12 will be pushed into the stack.
During function epilogue R12 will be popped and AUT R12, LR, SP will
be used to verify that the content of LR is still valid before return.
Here an example of PAC instrumented function prologue and epilogue:
pac r12, lr, sp
push {r3, r7, lr}
push {r12}
sub sp, sp, #4
Which, as shown here, generates a stack which does not preserve 8-byte
alignment.
I'm probably catastrofically wrong but shouldn't the stack be "all times
be aligned to a word boundary" [1]?
At a function boundary it must be 8-byte aligned (same reference). I
don't think GCC really optimizes leaf functions to permit sub 8-byte
alignment, but since you omitted the body of your function I might be
wrong in this case.
Also, what's wrong with
pac r12, lr, sp
push {r3, r7, ip, lr}
?
AFAIK the AAPCS32 defines the Frame Record to be 2 consecutive 32-bit
values of LR and FP on the stack so that's the reason.
GCC does not currently support AAPCS frame chains as that's a relatively
new feature in the AAPCS; so it's not something you need to be concerned
about right now. The AAPCS frame chain uses R11 as the frame-chain
register anyway. However, you are right that this does affect
-mtpcs-frame and will become relevant when we do add support for AAPCS
frame chains.
Which saves 2 bytes in the prologue and ...
[...] function body
add sp, sp, #4
pop {r12}
pop {r3, r7, lr}
aut r12, lr, sp
bx lr
pop {r3, r7, ip, lr}
aut r12, lr, sp
bx lr
which saves 4 bytes in the epilogue (repeated for each instance of the
epilogue).
The patch also takes care of generating a PACBTI instruction in
place
of the sequence BTI+PAC when Branch Target Identification is enabled
contextually.
What about variadic functions?
What about functions where lr is live on entry (where it's used for
passing the closure in nested functions)?
These two patches apply on top of Tejas series posted here [2].
Regressioned and arm-linux-gnu aarch64-linux-gnu bootstraped.
Best Regards
Andrea
[1]
<https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension>
[2] <https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581176.html>
+static bool arm_pac_enabled_for_curr_function_p (void);
I really don't like that name. There are a lot of functions with
variations of 'current function' in the name already and this creates
yet another variant. Something like
arm_current_function_pac_enabled_p() would be preferable; or, if that
really is too long, use 'current_func' which already has usage within
the compiler.
Ack
+(define_insn "pac_ip_lr_sp"
+ [(set (reg:DI IP_REGNUM)
+ (unspec:DI [(reg:DI SP_REGNUM) (reg:DI LR_REGNUM)]
+ UNSPEC_PAC_IP_LR_SP))]
+ ""
+ "pac\tr12, lr, sp")
+
+(define_insn "pacbti_ip_lr_sp"
+ [(set (reg:DI IP_REGNUM)
+ (unspec:DI [(reg:DI SP_REGNUM) (reg:DI LR_REGNUM)]
+ UNSPEC_PACBTI_IP_LR_SP))]
+ ""
+ "pacbti\tr12, lr, sp")
+
+(define_insn "aut_ip_lr_sp"
+ [(unspec:DI [(reg:DI IP_REGNUM) (reg:DI SP_REGNUM) (reg:DI LR_REGNUM)]
+ UNSPEC_AUT_IP_LR_SP)]
+ ""
+ "aut\tr12, lr, sp")
+
I think all these need a length attribute. Also, they should only be
enabled for thumb2 (certainly not in Arm state).
And when using explicit register names in an asm, prefix each name
with '%|', just in case the assembler dialect has a register name
prefix.
Ack
The names are somewhat unweildy, can't we use something more usefully
descriptive, like 'pac_nop', 'pacbti_nop' and 'aut_nop', since all
these instructions are using the architectural NOP space.
Finally, I think we need some more tests that cover the various
frame-pointer flavours when used in combination with this feature and
for various corners of the PCS.
Could you give some more indications of the falvors we what to have tests
for?
I'm thinking mostly about test cases for the additional situations I've
described above. But there's also testing that the code does the right
thing with -mtpcs-frame.
It might be that we want to declare -mtpcs-frame and branch protection
incompatible, which would save a lot of complicated validation. That's
probably acceptable because -mtpcs-frame is essentially deprecated
anyway (and will hopefully be superseded with AAPCS frame-chain support
before long). However, if we do that, then we at least need some option
compatibility diagnostic to reject such a combination and the
incompatibility will need documention.
R.
Thanks
Andrea
[1] <https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#id46>