https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89967

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=98877

--- Comment #10 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
Here is the current state of this as of today August 13, 2025:
With -fstack-reuse=none, the trunk has a few less mov than previous versions.
Without the code generation is the same.

With https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692520.html, there
is one extra move inside the loop but we use no callee saved registers.

The reason for why it is worse is because we still have some prop to do:
```
  D.23406 = __builtin_aarch64_ld2_lanev16qi_usus (_13, g1b, 8);
  g1b = D.23406;
  __b ={v} {CLOBBER(eos)};
  _99 = BIT_FIELD_REF <o1v_41(D), 32, 96>;
  _15 = (sizetype) _99;
  _16 = in_59(D) + _15;
  __b = g1b;
  D.23399 = __builtin_aarch64_ld2_lanev16qi_usus (_16, g1b, 12);
```

If we compile again with -fstack-reuse=none, there is better code generation;
that is due to the `{v} {CLOBBER(eos)}` statements above is missing.

So I have one more patch which I will be submitting once
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692520.html is approved
to get to the similar code generation as -fstack-reuse=none as we get now.
But that is still not optimial, PR 98877 is related to that issue I think.

Reply via email to