> On 11 Jun 2025, at 16:22, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> The PCS defines a lazy save scheme for managing ZA across normal
> "private-ZA" functions.  GCC currently uses this scheme for calls
> to all private-ZA functions (rather than using caller-save).
> 
> Therefore, before a sequence of calls to private-ZA functions, GCC emits
> code to set up a lazy save.  After the sequence of calls, GCC emits code
> to check whether lazy save was committed and restore the ZA contents
> if so.
> 
> These sequences are emitted by the mode-switching pass, in an attempt
> to reduce the number of redundant saves and restores.
> 
> The lazy save scheme also means that, before a function can use ZA,
> it must first conditionally store the old contents of ZA to the caller's
> lazy save buffer, if any.
> 
> This all creates some relatively complex dependencies between
> setup code, save/restore code, and normal reads from and writes to ZA.
> These dependencies are modelled using special fake hard registers:
> 
>    ;; Sometimes we use placeholder instructions to mark where later
>    ;; ABI-related lowering is needed.  These placeholders read and
>    ;; write this register.  Instructions that depend on the lowering
>    ;; read the register.
>    (LOWERING_REGNUM 87)
> 
>    ;; Represents the contents of the current function's TPIDR2 block,
>    ;; in abstract form.
>    (TPIDR2_BLOCK_REGNUM 88)
> 
>    ;; Holds the value that the current function wants PSTATE.ZA to be.
>    ;; The actual value can sometimes vary, because it does not track
>    ;; changes to PSTATE.ZA that happen during a lazy save and restore.
>    ;; Those effects are instead tracked by ZA_SAVED_REGNUM.
>    (SME_STATE_REGNUM 89)
> 
>    ;; Instructions write to this register if they set TPIDR2_EL0 to a
>    ;; well-defined value.  Instructions read from the register if they
>    ;; depend on the result of such writes.
>    ;;
>    ;; The register does not model the architected TPIDR2_ELO, just the
>    ;; current function's management of it.
>    (TPIDR2_SETUP_REGNUM 90)
> 
>    ;; Represents the property "has an incoming lazy save been committed?".
>    (ZA_FREE_REGNUM 91)
> 
>    ;; Represents the property "are the current function's ZA contents
>    ;; stored in the lazy save buffer, rather than in ZA itself?".
>    (ZA_SAVED_REGNUM 92)
> 
>    ;; Represents the contents of the current function's ZA state in
>    ;; abstract form.  At various times in the function, these contents
>    ;; might be stored in ZA itself, or in the function's lazy save buffer.
>    ;;
>    ;; The contents persist even when the architected ZA is off.  Private-ZA
>    ;; functions have no effect on its contents.
>    (ZA_REGNUM 93)
> 
> Every normal read from ZA and write to ZA depends on SME_STATE_REGNUM,
> in order to sequence the code with the initial setup of ZA and
> with the lazy save scheme.
> 
> The code to restore ZA after a call involves several instructions,
> including conditional control flow.  It is initially represented as
> a single define_insn and is split late, after shrink-wrapping and
> prologue/epilogue insertion.
> 
> The split form of the restore instruction includes a conditional call
> to __arm_tpidr2_restore:
> 
> (define_insn "aarch64_tpidr2_restore"
>  [(set (reg:DI ZA_SAVED_REGNUM)
> (unspec:DI [(reg:DI R0_REGNUM)] UNSPEC_TPIDR2_RESTORE))
>   (set (reg:DI SME_STATE_REGNUM)
> (unspec:DI [(reg:DI SME_STATE_REGNUM)] UNSPEC_TPIDR2_RESTORE))
>  ...
> )
> 
> The write to SME_STATE_REGNUM indicates the end of the region where
> ZA_REGNUM might differ from the real contents of ZA.  In other words,
> it is the point at which normal reads from ZA and writes to ZA
> can safely take place.
> 
> To finally get to the point, the problem in this PR was that the
> unsplit aarch64_restore_za pattern was missing this change to
> SME_STATE_REGNUM.  It could therefore be deleted as dead before
> it had chance to be split.  The split form had the correct dataflow,
> but the unsplit form didn't.
> 
> Unfortunately, the tests for this code tended to use calls and asms
> to model regions of ZA usage, and those don't seem to be affected
> in the same way.
> 
> Tested on aarch64-linux-gnu.  OK for trunk and branches?
> 

Ok by me.


> Richard
> 
> 
> gcc/
> PR target/120624
> * config/aarch64/aarch64.md (SME_STATE_REGNUM): Expand on comments.
> * config/aarch64/aarch64-sme.md (aarch64_restore_za): Also set
> SME_STATE_REGNUM

Server-side hooks would catch this, but full stop needed here.
Thanks,
Kyrill

> 
> gcc/testsuite/
> PR target/120624
> * gcc.target/aarch64/sme/za_state_7.c: New test.
> ---
> gcc/config/aarch64/aarch64-sme.md             |  2 ++
> gcc/config/aarch64/aarch64.md                 |  8 +++++++
> .../gcc.target/aarch64/sme/za_state_7.c       | 21 +++++++++++++++++++
> 3 files changed, 31 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c
> 
> diff --git a/gcc/config/aarch64/aarch64-sme.md 
> b/gcc/config/aarch64/aarch64-sme.md
> index c49affd0dd3..f7958c90eae 100644
> --- a/gcc/config/aarch64/aarch64-sme.md
> +++ b/gcc/config/aarch64/aarch64-sme.md
> @@ -373,6 +373,8 @@ (define_insn_and_split "aarch64_restore_za"
>    (reg:DI SME_STATE_REGNUM)
>    (reg:DI TPIDR2_SETUP_REGNUM)
>    (reg:DI ZA_SAVED_REGNUM)] UNSPEC_RESTORE_ZA))
> +   (set (reg:DI SME_STATE_REGNUM)
> + (unspec:DI [(reg:DI SME_STATE_REGNUM)] UNSPEC_TPIDR2_RESTORE))
>    (clobber (reg:DI R0_REGNUM))
>    (clobber (reg:DI R14_REGNUM))
>    (clobber (reg:DI R15_REGNUM))
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 6dbc9faf713..e11e13033d2 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -136,6 +136,14 @@ (define_constants
>     ;; The actual value can sometimes vary, because it does not track
>     ;; changes to PSTATE.ZA that happen during a lazy save and restore.
>     ;; Those effects are instead tracked by ZA_SAVED_REGNUM.
> +    ;;
> +    ;; Sequences also write to this register if they synchronize the
> +    ;; actual contents of ZA and PSTATE.ZA with the current function's
> +    ;; ZA_REGNUM and SME_STATE_REGNUM.  Conceptually, these extra writes
> +    ;; do not change the value of SME_STATE_REGNUM.  They simply act as
> +    ;; sequencing points.  They means that all direct accesses to ZA can
> +    ;; depend only on ZA_REGNUM and SME_STATE_REGNUM, rather than also
> +    ;; depending on ZA_SAVED_REGNUM etc.
>     (SME_STATE_REGNUM 89)
> 
>     ;; Instructions write to this register if they set TPIDR2_EL0 to a
> diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c 
> b/gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c
> new file mode 100644
> index 00000000000..38bc1347159
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c
> @@ -0,0 +1,21 @@
> +// { dg-options "-O -fno-optimize-sibling-calls -fomit-frame-pointer" }
> +
> +#include <arm_sme.h>
> +
> +void callee();
> +
> +__arm_new("za") __arm_locally_streaming int test()
> +{
> +  svbool_t all = svptrue_b8();
> +  svint8_t expected = svindex_s8(1, 1);
> +  svwrite_hor_za8_m(0, 0, all, expected);
> +
> +  callee();
> +
> +  svint8_t actual = svread_hor_za8_m(svdup_s8(0), all, 0, 0);
> +  return svptest_any(all, svcmpne(all, expected, actual));
> +}
> +
> +// { dg-final { scan-assembler {\tbl\t__arm_tpidr2_save\n} } }
> +// { dg-final { scan-assembler {\tbl\t__arm_tpidr2_restore\n} } }
> +// { dg-final { scan-assembler-times {\tsmstart\tza\n} 2 } }
> -- 
> 2.43.0
> 

Reply via email to