https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96342
--- Comment #8 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org>
---
(In reply to yangyang from comment #3)
> The work is mainly composed of three parts: the generating of SVE
> functions for "omp declare simd" in pass_omp_simd_clone, the supporting of
> SVE PCS of non-builtin types, and the generating of the call of SVE
> vectoried functions in pass_vect. I plan to finish this work in the
> following five steps, each step corresponds to a patch:
This plan looks really good to me, thanks. I agree with everything
I've snipped in this reply.
> f) In pass_expand, only when a “SVE type” attribute is added to the tree
> nodes of the types of arguments and return type, these types use the SVE
> PCS. For now, GCC only has a mechanism for adding attributes to SVE builtin
> type, so I plan to define a new hook to add attribute to the types of
> arguments and return type of simdclones generated if needed. The related
> processing functions are planned to be moved to aarch64.c from
> aarch64-sve-builtin.cc in addition.
It's a very minor detail, sorry, but I'd prefer to keep stuff in
aarch64-sve-builtins.cc if possible, and simply export the functions
that we need via aarch64-protos.h.
> Part 4) Add the generating of VLS SVE functions for "omp declare simd". The
> specification writes: “When using a simdlen(len) clause, the compiler
> expects a VLS vector version of the function that is tuned for a specific
> implementation of SVE. ”. Therefore I think only when the number of bits in
> a SVE vector register of the target is specified and coincides with the
> simdlen clause, GCC is supposed to generate the VLS SVE functions for "omp
> declare simd",
I think in principle we should generate this unconditionally.
There are two possible approaches, in increasing order of
quality of implementation:
(1) Divide the problem into three cases:
(a) -msve-vector-bits=scalable
In this case, generate VLA code for the VLS routines.
The point here is that the VLS interface guarantees
that the SVE registers are a particular size, but the
compiler is not required to take advantage of that
information. Using VLA code is a valid implementation
choice.
(b) -msve-vectors-bits=N, N matches the simdlen
For this we'd generate VLS code in the way that you
describe.
(c) -msve-vectors-bits=N, N does not match the simdlen
We should silently accept this for declarations, but emit
a warning or an error if the compiler needs to generate a
definition.
(2) Allow -msve-vector-bits= to vary on a function-by-function basis,
in the same way that the set of target features can already vary
on a function-by-function basis. Then, as a follow-on change,
use this feature to generate VLS code for whichever simdlen
the code specifies.
(2) is likely to be tricky, so I'd recommend starting with (1)
and treating (2) as a potential future optimisation.
> Part 5) Generate the call of SVE vectoried functions in pass_vect,
> specifically:
>
> a) Define a new hook that return true if the target support variable vector
> length simdclones and set the aarch64 return value to true if TARGET_SVE. In
> vectorizable_simd_clone_call, continue analyzing instead of directly
> returning false.
It would be good to generalise existing hooks if possible, rather than
add one specifically for VLA vs. VLS.
> In addition, I have finished the first two patches and attached them on
> this PR. Is it necessary to send the patchs to the GCC patches mailing list
> for reviewing?
Yeah, if you could send them to gcc-patches, that'd be great.