Hi folks,

Hoping for some input from Richard S here (or other AArch64 maintainers). My
question is around the modes we use to reference ZA in the SME ACLE
implementation.

I am particularly curious about the convention described in the following
comment above sme_2mode_function_t in aarch64-sve-builtins-functions.h:

/* General SME unspec-based functions, parameterized on both the ZA mode
   and the vector mode.  If the elements of the ZA and vector modes are
   the same size (e.g. _za64_f64 or _za32_s32) then the two mode arguments
   are equal, otherwise the first mode argument is the single-vector integer
   mode associated with the ZA suffix and the second mode argument is the
   tuple mode associated with the vector suffix.  */
template<insn_code (*CODE) (int, machine_mode, machine_mode),
         insn_code (*CODE_SINGLE) (int, machine_mode, machine_mode)>
class sme_2mode_function_t : public read_write_za<unspec_based_function_base>
{
  [...]
}

So essentially this means that for an FP intrinsic like
svmopa_za32_f32_m, we access ZA in an FP mode (VNx4SFmode), with an insn
like:

(insn 9 8 0 2 (set (reg:VNx4SF 93 za)
        (unspec:VNx4SF [
                (reg:VNx4SF 93 za)
                (reg:DI 89 sme_state)
                (const_int 0 [0])
                (reg:VNx4BI 103) repeated x2
                (reg/v:VNx4SF 101 [ zn ])
                (reg/v:VNx4SF 102 [ zm ])
            ] UNSPEC_SME_FMOPA)) "t.c":6:5 15949 {aarch64_sme_fmopavnx4sfvnx4sf}
     (nil))

but for a widening FP intrinsic like svmopa_za32_f16_m, we instead get
an integer mode for ZA (VNx4SImode):

(insn 9 8 0 2 (set (reg:VNx4SI 93 za)
        (unspec:VNx4SI [
                (reg:VNx4SI 93 za)
                (reg:DI 89 sme_state)
                (const_int 0 [0])
                (reg:VNx4BI 103) repeated x2
                (reg/v:VNx8HF 101 [ zn ])
                (reg/v:VNx8HF 102 [ zm ])
            ] UNSPEC_SME_FMOPA)) "t.c":12:5 15959 
{aarch64_sme_fmopavnx4sivnx8hf}
     (nil))

which at first I found a little surprising, given that the underlying
instruction still interprets the ZA contents as floating point.

I was curious about the rationale for this convention.  Possible
alternatives that come to mind are:

(1) Always using an integer mode for ZA accesses (if it's OK to do it
    for the widening case above, why not always?)
(2) Match the ZA mode to the vector operands: so always use an FP mode
    of the appropriate width when the vector operands are FP operands, and
    otherwise use an integer mode.

Of these, (2) seems the most natural to me, but I'm sure there's a good
reason that it's done the way it is.

If anyone could shed any light on the rationale here, that would be
much appreciated.

Thanks,
Alex

Reply via email to