> On 11 Jun 2025, at 16:22, Richard Sandiford <richard.sandif...@arm.com> wrote:
>
> The PCS defines a lazy save scheme for managing ZA across normal
> "private-ZA" functions. GCC currently uses this scheme for calls
> to all private-ZA functions (rather than using caller-save).
>
> Therefore, before a sequence of calls to private-ZA functions, GCC emits
> code to set up a lazy save. After the sequence of calls, GCC emits code
> to check whether lazy save was committed and restore the ZA contents
> if so.
>
> These sequences are emitted by the mode-switching pass, in an attempt
> to reduce the number of redundant saves and restores.
>
> The lazy save scheme also means that, before a function can use ZA,
> it must first conditionally store the old contents of ZA to the caller's
> lazy save buffer, if any.
>
> This all creates some relatively complex dependencies between
> setup code, save/restore code, and normal reads from and writes to ZA.
> These dependencies are modelled using special fake hard registers:
>
> ;; Sometimes we use placeholder instructions to mark where later
> ;; ABI-related lowering is needed. These placeholders read and
> ;; write this register. Instructions that depend on the lowering
> ;; read the register.
> (LOWERING_REGNUM 87)
>
> ;; Represents the contents of the current function's TPIDR2 block,
> ;; in abstract form.
> (TPIDR2_BLOCK_REGNUM 88)
>
> ;; Holds the value that the current function wants PSTATE.ZA to be.
> ;; The actual value can sometimes vary, because it does not track
> ;; changes to PSTATE.ZA that happen during a lazy save and restore.
> ;; Those effects are instead tracked by ZA_SAVED_REGNUM.
> (SME_STATE_REGNUM 89)
>
> ;; Instructions write to this register if they set TPIDR2_EL0 to a
> ;; well-defined value. Instructions read from the register if they
> ;; depend on the result of such writes.
> ;;
> ;; The register does not model the architected TPIDR2_ELO, just the
> ;; current function's management of it.
> (TPIDR2_SETUP_REGNUM 90)
>
> ;; Represents the property "has an incoming lazy save been committed?".
> (ZA_FREE_REGNUM 91)
>
> ;; Represents the property "are the current function's ZA contents
> ;; stored in the lazy save buffer, rather than in ZA itself?".
> (ZA_SAVED_REGNUM 92)
>
> ;; Represents the contents of the current function's ZA state in
> ;; abstract form. At various times in the function, these contents
> ;; might be stored in ZA itself, or in the function's lazy save buffer.
> ;;
> ;; The contents persist even when the architected ZA is off. Private-ZA
> ;; functions have no effect on its contents.
> (ZA_REGNUM 93)
>
> Every normal read from ZA and write to ZA depends on SME_STATE_REGNUM,
> in order to sequence the code with the initial setup of ZA and
> with the lazy save scheme.
>
> The code to restore ZA after a call involves several instructions,
> including conditional control flow. It is initially represented as
> a single define_insn and is split late, after shrink-wrapping and
> prologue/epilogue insertion.
>
> The split form of the restore instruction includes a conditional call
> to __arm_tpidr2_restore:
>
> (define_insn "aarch64_tpidr2_restore"
> [(set (reg:DI ZA_SAVED_REGNUM)
> (unspec:DI [(reg:DI R0_REGNUM)] UNSPEC_TPIDR2_RESTORE))
> (set (reg:DI SME_STATE_REGNUM)
> (unspec:DI [(reg:DI SME_STATE_REGNUM)] UNSPEC_TPIDR2_RESTORE))
> ...
> )
>
> The write to SME_STATE_REGNUM indicates the end of the region where
> ZA_REGNUM might differ from the real contents of ZA. In other words,
> it is the point at which normal reads from ZA and writes to ZA
> can safely take place.
>
> To finally get to the point, the problem in this PR was that the
> unsplit aarch64_restore_za pattern was missing this change to
> SME_STATE_REGNUM. It could therefore be deleted as dead before
> it had chance to be split. The split form had the correct dataflow,
> but the unsplit form didn't.
>
> Unfortunately, the tests for this code tended to use calls and asms
> to model regions of ZA usage, and those don't seem to be affected
> in the same way.
>
> Tested on aarch64-linux-gnu. OK for trunk and branches?
>
Ok by me.
> Richard
>
>
> gcc/
> PR target/120624
> * config/aarch64/aarch64.md (SME_STATE_REGNUM): Expand on comments.
> * config/aarch64/aarch64-sme.md (aarch64_restore_za): Also set
> SME_STATE_REGNUM
Server-side hooks would catch this, but full stop needed here.
Thanks,
Kyrill
>
> gcc/testsuite/
> PR target/120624
> * gcc.target/aarch64/sme/za_state_7.c: New test.
> ---
> gcc/config/aarch64/aarch64-sme.md | 2 ++
> gcc/config/aarch64/aarch64.md | 8 +++++++
> .../gcc.target/aarch64/sme/za_state_7.c | 21 +++++++++++++++++++
> 3 files changed, 31 insertions(+)
> create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c
>
> diff --git a/gcc/config/aarch64/aarch64-sme.md
> b/gcc/config/aarch64/aarch64-sme.md
> index c49affd0dd3..f7958c90eae 100644
> --- a/gcc/config/aarch64/aarch64-sme.md
> +++ b/gcc/config/aarch64/aarch64-sme.md
> @@ -373,6 +373,8 @@ (define_insn_and_split "aarch64_restore_za"
> (reg:DI SME_STATE_REGNUM)
> (reg:DI TPIDR2_SETUP_REGNUM)
> (reg:DI ZA_SAVED_REGNUM)] UNSPEC_RESTORE_ZA))
> + (set (reg:DI SME_STATE_REGNUM)
> + (unspec:DI [(reg:DI SME_STATE_REGNUM)] UNSPEC_TPIDR2_RESTORE))
> (clobber (reg:DI R0_REGNUM))
> (clobber (reg:DI R14_REGNUM))
> (clobber (reg:DI R15_REGNUM))
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 6dbc9faf713..e11e13033d2 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -136,6 +136,14 @@ (define_constants
> ;; The actual value can sometimes vary, because it does not track
> ;; changes to PSTATE.ZA that happen during a lazy save and restore.
> ;; Those effects are instead tracked by ZA_SAVED_REGNUM.
> + ;;
> + ;; Sequences also write to this register if they synchronize the
> + ;; actual contents of ZA and PSTATE.ZA with the current function's
> + ;; ZA_REGNUM and SME_STATE_REGNUM. Conceptually, these extra writes
> + ;; do not change the value of SME_STATE_REGNUM. They simply act as
> + ;; sequencing points. They means that all direct accesses to ZA can
> + ;; depend only on ZA_REGNUM and SME_STATE_REGNUM, rather than also
> + ;; depending on ZA_SAVED_REGNUM etc.
> (SME_STATE_REGNUM 89)
>
> ;; Instructions write to this register if they set TPIDR2_EL0 to a
> diff --git a/gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c
> b/gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c
> new file mode 100644
> index 00000000000..38bc1347159
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sme/za_state_7.c
> @@ -0,0 +1,21 @@
> +// { dg-options "-O -fno-optimize-sibling-calls -fomit-frame-pointer" }
> +
> +#include <arm_sme.h>
> +
> +void callee();
> +
> +__arm_new("za") __arm_locally_streaming int test()
> +{
> + svbool_t all = svptrue_b8();
> + svint8_t expected = svindex_s8(1, 1);
> + svwrite_hor_za8_m(0, 0, all, expected);
> +
> + callee();
> +
> + svint8_t actual = svread_hor_za8_m(svdup_s8(0), all, 0, 0);
> + return svptest_any(all, svcmpne(all, expected, actual));
> +}
> +
> +// { dg-final { scan-assembler {\tbl\t__arm_tpidr2_save\n} } }
> +// { dg-final { scan-assembler {\tbl\t__arm_tpidr2_restore\n} } }
> +// { dg-final { scan-assembler-times {\tsmstart\tza\n} 2 } }
> --
> 2.43.0
>