Alex Coplan <[email protected]> writes: > Hi folks, > > Hoping for some input from Richard S here (or other AArch64 maintainers). My > question is around the modes we use to reference ZA in the SME ACLE > implementation. > > I am particularly curious about the convention described in the following > comment above sme_2mode_function_t in aarch64-sve-builtins-functions.h: > > /* General SME unspec-based functions, parameterized on both the ZA mode > and the vector mode. If the elements of the ZA and vector modes are > the same size (e.g. _za64_f64 or _za32_s32) then the two mode arguments > are equal, otherwise the first mode argument is the single-vector integer > mode associated with the ZA suffix and the second mode argument is the > tuple mode associated with the vector suffix. */ > template<insn_code (*CODE) (int, machine_mode, machine_mode), > insn_code (*CODE_SINGLE) (int, machine_mode, machine_mode)> > class sme_2mode_function_t : public read_write_za<unspec_based_function_base> > { > [...] > } > > So essentially this means that for an FP intrinsic like > svmopa_za32_f32_m, we access ZA in an FP mode (VNx4SFmode), with an insn > like: > > (insn 9 8 0 2 (set (reg:VNx4SF 93 za) > (unspec:VNx4SF [ > (reg:VNx4SF 93 za) > (reg:DI 89 sme_state) > (const_int 0 [0]) > (reg:VNx4BI 103) repeated x2 > (reg/v:VNx4SF 101 [ zn ]) > (reg/v:VNx4SF 102 [ zm ]) > ] UNSPEC_SME_FMOPA)) "t.c":6:5 15949 > {aarch64_sme_fmopavnx4sfvnx4sf} > (nil)) > > but for a widening FP intrinsic like svmopa_za32_f16_m, we instead get > an integer mode for ZA (VNx4SImode): > > (insn 9 8 0 2 (set (reg:VNx4SI 93 za) > (unspec:VNx4SI [ > (reg:VNx4SI 93 za) > (reg:DI 89 sme_state) > (const_int 0 [0]) > (reg:VNx4BI 103) repeated x2 > (reg/v:VNx8HF 101 [ zn ]) > (reg/v:VNx8HF 102 [ zm ]) > ] UNSPEC_SME_FMOPA)) "t.c":12:5 15959 > {aarch64_sme_fmopavnx4sivnx8hf} > (nil)) > > which at first I found a little surprising, given that the underlying > instruction still interprets the ZA contents as floating point. > > I was curious about the rationale for this convention. Possible > alternatives that come to mind are: > > (1) Always using an integer mode for ZA accesses (if it's OK to do it > for the widening case above, why not always?) > (2) Match the ZA mode to the vector operands: so always use an FP mode > of the appropriate width when the vector operands are FP operands, and > otherwise use an integer mode. > > Of these, (2) seems the most natural to me, but I'm sure there's a good > reason that it's done the way it is.
I don't think there's a perfect choice here. The mode of ZA is not interpreted strictly according to the usual RTL semantics. That would be impossible with the current infrastructure, since the number of bytes depends on the VL squared. Instead, the mode is supposedly just a convenience (although your question suggests it might fail there). This works since ZA is a fixed register and must always be accessed by unspecs that are opaque to target-independent code. It therefore doesn't matter whether the insn patterns use I modes or F modes. That being the case, there didn't seem any point in distinguishing between "ZA suffixes that map to an I mode" and "ZA suffixes that map to an F mode". We might as well just have one set of ZA suffixes: DEF_SME_ZA_SUFFIX (za8, 8, VNx16QImode) DEF_SME_ZA_SUFFIX (za16, 16, VNx8HImode) DEF_SME_ZA_SUFFIX (za32, 32, VNx4SImode) DEF_SME_ZA_SUFFIX (za64, 64, VNx2DImode) DEF_SME_ZA_SUFFIX (za128, 128, VNx1TImode) that map directly to the spec. That's the reason for not doing (2). (2) would mean either (a) defining "integer ZA suffixes" and "FP ZA suffixes", or (b) encoding integerness or FPness in the function_base (meaning more variations of sme_2mode). (1) would indeed be OK, which is why that is essentially the underlying function_instance encoding. But it would mean that FP instructions that operate on a single datatype would nevertheless need to be parameterised on two different modes. And the way that "@" patterns work is that it is always the iterator that is passed in place of "<...>", even if the "<...>" is a mode attribute. Thus it would not be enough to have: (define_insn "@aarch64_<op><FP_ITERATOR:int_equivalent><FP_ITERATOR:mode>" ...) We would need to have two separate iterators: one integer and one FP: (define_insn "@aarch64_<op><INT_ITERATOR:mode><FP_ITERATOR:mode>" ...) and use C++ conditions to make sure that they have the same element size. Although we do use that type of C++ condition for some mode combinations, it's better not to lean on it too much, since all combinations do still exist in a sense. It's just that the generators make some attempt to compile out unneeded combinations. Also, sme_2mode's current approach is consistent with sme_1mode in cases where the ZA element size matches the vector element size. This means that an intrinsic could be converted from sme_1mode to sme_2mode for later extensions without having to change the existing patterns. (1) would only achieve that if we standardised on integer ZA modes for all intrinsics, not just sme_2mode ones, which seemed like an extra level of complication. Thanks, Richard
