On 17/12/2021 15:52, Andrea Corallo wrote:
Hi Richard,

thanks for reviewing!  Some comments inline.

Richard Earnshaw <richard.earns...@foss.arm.com> writes:
On 05/11/2021 08:52, Andrea Corallo via Gcc-patches wrote:
Hi all,
this patch enables address return signature and verification based
on
Armv8.1-M Pointer Authentication [1].
To sign the return address, we use the PAC R12, LR, SP instruction
upon function entry.  This is signing LR using SP and storing the
result in R12.  R12 will be pushed into the stack.
During function epilogue R12 will be popped and AUT R12, LR, SP will
be used to verify that the content of LR is still valid before return.
Here an example of PAC instrumented function prologue and epilogue:
          pac     r12, lr, sp
          push    {r3, r7, lr}
          push    {r12}
          sub     sp, sp, #4

Which, as shown here, generates a stack which does not preserve 8-byte
alignment.

I'm probably catastrofically wrong but shouldn't the stack be "all times
be aligned to a word boundary" [1]?

At a function boundary it must be 8-byte aligned (same reference). I don't think GCC really optimizes leaf functions to permit sub 8-byte alignment, but since you omitted the body of your function I might be wrong in this case.


Also, what's wrong with

        pac     r12, lr, sp
        push    {r3, r7, ip, lr}
?

AFAIK the AAPCS32 defines the Frame Record to be 2 consecutive 32-bit
values of LR and FP on the stack so that's the reason.

GCC does not currently support AAPCS frame chains as that's a relatively new feature in the AAPCS; so it's not something you need to be concerned about right now. The AAPCS frame chain uses R11 as the frame-chain register anyway. However, you are right that this does affect -mtpcs-frame and will become relevant when we do add support for AAPCS frame chains.


Which saves 2 bytes in the prologue and ...

          [...] function body
          add     sp, sp, #4
          pop     {r12}
          pop     {r3, r7, lr}
          aut     r12, lr, sp
          bx      lr

        pop     {r3, r7, ip, lr}
        aut     r12, lr, sp
        bx      lr

which saves 4 bytes in the epilogue (repeated for each instance of the
epilogue).

The patch also takes care of generating a PACBTI instruction in
place
of the sequence BTI+PAC when Branch Target Identification is enabled
contextually.


What about variadic functions?

What about functions where lr is live on entry (where it's used for
passing the closure in nested functions)?

These two patches apply on top of Tejas series posted here [2].
Regressioned and arm-linux-gnu aarch64-linux-gnu bootstraped.
Best Regards
    Andrea
[1]
<https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension>
[2] <https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581176.html>



+static bool arm_pac_enabled_for_curr_function_p (void);

I really don't like that name.  There are a lot of functions with
variations of 'current function' in the name already and this creates
yet another variant.  Something like
arm_current_function_pac_enabled_p() would be preferable; or, if that
really is too long, use 'current_func' which already has usage within
the compiler.

Ack

+(define_insn "pac_ip_lr_sp"
+  [(set (reg:DI IP_REGNUM)
+       (unspec:DI [(reg:DI SP_REGNUM) (reg:DI LR_REGNUM)]
+                   UNSPEC_PAC_IP_LR_SP))]
+  ""
+  "pac\tr12, lr, sp")
+
+(define_insn "pacbti_ip_lr_sp"
+  [(set (reg:DI IP_REGNUM)
+       (unspec:DI [(reg:DI SP_REGNUM) (reg:DI LR_REGNUM)]
+                   UNSPEC_PACBTI_IP_LR_SP))]
+  ""
+  "pacbti\tr12, lr, sp")
+
+(define_insn "aut_ip_lr_sp"
+  [(unspec:DI [(reg:DI IP_REGNUM) (reg:DI SP_REGNUM) (reg:DI LR_REGNUM)]
+              UNSPEC_AUT_IP_LR_SP)]
+  ""
+  "aut\tr12, lr, sp")
+

I think all these need a length attribute.  Also, they should only be
enabled for thumb2 (certainly not in Arm state).
And when using explicit register names in an asm, prefix each name
with '%|', just in case the assembler dialect has a register name
prefix.

Ack

The names are somewhat unweildy, can't we use something more usefully
descriptive, like 'pac_nop', 'pacbti_nop' and 'aut_nop', since all
these instructions are using the architectural NOP space.

Finally, I think we need some more tests that cover the various
frame-pointer flavours when used in combination with this feature and
for various corners of the PCS.

Could you give some more indications of the falvors we what to have tests
for?

I'm thinking mostly about test cases for the additional situations I've described above. But there's also testing that the code does the right thing with -mtpcs-frame.

It might be that we want to declare -mtpcs-frame and branch protection incompatible, which would save a lot of complicated validation. That's probably acceptable because -mtpcs-frame is essentially deprecated anyway (and will hopefully be superseded with AAPCS frame-chain support before long). However, if we do that, then we at least need some option compatibility diagnostic to reject such a combination and the incompatibility will need documention.

R.


Thanks

   Andrea

[1] <https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#id46>

Reply via email to