On Mon, Feb 09, 2026 at 10:21:15AM -0800, Andrew Pinski wrote:
> On Mon, Feb 9, 2026 at 10:13 AM Alice Carlotti <[email protected]> wrote:
> >
> > On Fri, Jan 16, 2026 at 01:25:23AM -0800, Andrew Pinski wrote:
> > > I think this is the hard approach to do this.
> >
> > Agreed now.
> >
> > > The easy approach is to return false from aarch64_use_return_insn_p if
> > > you want to disable shrink wrapping for a function.
> > > So just in aarch64_use_return_insn_p, add:
> > > if (aarch64_cfun_enables_pstate_sm ())
> > > return false;
> >
> > This isn't quite the right place to check it, since shrink wrapping checks
> > targetm.have_simple_return, whereas the above function is called by
> > targetm.have_return.
> >
> > Instead, I have recreated the aarch64_use_simple_return_insn function that
> > existed previously until r10-3507-gce9d2a37f2db20, and put the check there.
> >
> > However, I'm now confused about the comment for the aarch64_use_return_insn
> > function - I'm not convinced this function is doing what it claims.
> >
> > >
> > > If that does not work, then let's add a target hook that is used here.
> > > Because there might be other targets that want to disable shrink
> > > wrapping based on the attributes and disabling it via the flag seems
> > > like a hack.
> >
> > This did feel like a bit of a hack, but I thought it might be in keeping
> > with
> > some other code that seems to update optimisation information for SME
> > functions. I also hadn't realised that the targetm.have_simple_return check
> > could already be used to inhibit shrink wrapping (and had been in the past).
> >
> > >
> > > Thanks,
> > > Andrew Pinski
> >
> >
> > I've also updated the test case. For some reason the original test case
> > wasn't
> > passing or failing as I intended, so I've changed the callee to use
> > streaming
> > mode, and simplified the checks for bar. The checks in baz are probably
> > redundant now that I'm no longer messing around with optimization flags, so
> > I
> > could remove that if preferred.
> >
> > Is this version OK for master and backport to affected branches?
>
> Ok for all affected open branches (which I think is GCC 13, 14, and 15).
Thanks. Unless I miscounted, SME only arrived in GCC 14, so 13 is unaffected.
(I've pushed to master now, and will push to GCC 14+15 once I've finished
testing those.)
Alice
>
> Thanks,
> Andrew
>
> >
> > Thanks,
> > Alice
> >
> > ---
> >
> > The meaning of poly_int values changes depending on whether we are in
> > streaming or non-streaming mode, but this dependency is not explicitly
> > tracked. Locally-streaming functions can change streaming state in the
> > prologue and epilogue, so it is unsafe to apply shrink wrapping to these
> > functions, as doing so could change the mode seen by instructions like
> > cntd.
> >
> > gcc/ChangeLog:
> >
> > PR target/123624
> > * config/aarch64/aarch64-protos.h
> > (aarch64_use_simple_return_insn_p): New.
> > * config/aarch64/aarch64.cc
> > (aarch64_use_simple_return_insn_p): New, used...
> > * config/aarch64/aarch64.md (simple_return): ...here.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/123624
> > * gcc.target/aarch64/sme/sme-shrinkwrap.c: New test.
> >
> >
> > diff --git a/gcc/config/aarch64/aarch64-protos.h
> > b/gcc/config/aarch64/aarch64-protos.h
> > index
> > 48d3a3de2356d385fc9612b5a0c546623bbc60fe..d1f2873f208bba675aab82dcf2831da3350785c6
> > 100644
> > --- a/gcc/config/aarch64/aarch64-protos.h
> > +++ b/gcc/config/aarch64/aarch64-protos.h
> > @@ -984,6 +984,7 @@ bool aarch64_uimm12_shift (unsigned HOST_WIDE_INT);
> > int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &);
> > bool aarch64_is_mov_xn_imm (unsigned HOST_WIDE_INT);
> > bool aarch64_use_return_insn_p (void);
> > +bool aarch64_use_simple_return_insn_p (void);
> > const char *aarch64_output_casesi (rtx *);
> > const char *aarch64_output_load_tp (rtx);
> > const char *aarch64_output_sme_zero_za (rtx);
> > diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> > index
> > 293afa52b3b38781b765ca939ed51c280313bab4..dfe1d8ab309b3483318ac9e634510288d2dd6475
> > 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -10765,6 +10765,20 @@ aarch64_use_return_insn_p (void)
> > return known_eq (cfun->machine->frame.frame_size, 0);
> > }
> >
> > +/* Return false for locally streaming functions in order to avoid
> > + shrink-wrapping them. Shrink-wrapping is unsafe when the function
> > prologue
> > + and epilogue contain streaming state change, because these implicitly
> > change
> > + the meaning of poly_int values. */
> > +
> > +bool
> > +aarch64_use_simple_return_insn_p (void)
> > +{
> > + if (aarch64_cfun_enables_pstate_sm ())
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > /* Generate the epilogue instructions for returning from a function.
> > This is almost exactly the reverse of the prolog sequence, except
> > that we need to insert barriers to avoid scheduling loads that read
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index
> > 71458bf78f5cc4d926d7c5e0467daec9a5d75a03..b7f2f42be6e30c02b74703474ce640d795bb142c
> > 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -1414,7 +1414,7 @@
> >
> > (define_insn "simple_return"
> > [(simple_return)]
> > - ""
> > + "aarch64_use_simple_return_insn_p ()"
> > {
> > output_asm_insn ("ret", operands);
> > return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
> > diff --git a/gcc/testsuite/gcc.target/aarch64/sme/sme-shrinkwrap.c
> > b/gcc/testsuite/gcc.target/aarch64/sme/sme-shrinkwrap.c
> > new file mode 100644
> > index
> > 0000000000000000000000000000000000000000..b8dfd5f92f62f048ce14b9e2196d325e33cf5c20
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/sme/sme-shrinkwrap.c
> > @@ -0,0 +1,78 @@
> > +/* { dg-options "-O3 -fshrink-wrap" } */
> > +/* { dg-do run { target { aarch64_sme_hw && aarch64_sve_hw } } } */
> > +/* { dg-do compile { target { ! { aarch64_sme_hw && aarch64_sve_hw } } } }
> > */
> > +/* { dg-final { check-function-bodies "**" "" } } */
> > +
> > +#include <arm_sme.h>
> > +
> > +#pragma GCC target "+sve"
> > +
> > +[[gnu::noipa]]
> > +__arm_streaming
> > +int callee (int x)
> > +{
> > + return 0;
> > +}
> > +
> > +/*
> > +** foo:
> > +** cbnz w0, [^\n]*
> > +** cntd x0
> > +** ret
> > +** ...
> > +*/
> > +__arm_streaming
> > +int foo(int x)
> > +{
> > + if (x)
> > + return callee(3);
> > + return svcntd();
> > +}
> > +
> > +/*
> > +** bar:
> > +** ...
> > +** smstart [^\n]*
> > +** ...
> > +** (
> > +** cntd [^\n]*
> > +** ...
> > +** cbn?z [^\n]*
> > +** |
> > +** cbn?z [^\n]*
> > +** ...
> > +** cntd [^\n]*
> > +** )
> > +** ...
> > +*/
> > +
> > +__arm_locally_streaming
> > +int bar(int x)
> > +{
> > + if (x)
> > + return callee(3);
> > + return svcntd();
> > +}
> > +
> > +/*
> > +** baz:
> > +** cbnz w0, [^\n]*
> > +** cntd x0
> > +** ret
> > +** ...
> > +*/
> > +__arm_streaming
> > +int baz(int x)
> > +{
> > + if (x)
> > + return callee(3);
> > + return svcntd();
> > +}
> > +
> > +[[gnu::noipa]]
> > +int main()
> > +{
> > + if (bar(0) != svcntsd())
> > + __builtin_abort();
> > + return 0;
> > +}