On 20/04/2023 15:51, Richard Sandiford wrote:
"Andre Vieira (lists)" <andre.simoesdiasvie...@arm.com> writes:
Hi all,

This is a series of patches/RFCs to implement support in GCC to be able
to target AArch64's libmvec functions that will be/are being added to glibc.
We have chosen to use the omp pragma '#pragma omp declare variant ...'
with a simd construct as the way for glibc to inform GCC what functions
are available.

For example, if we would like to supply a vector version of the scalar
'cosf' we would have an include file with something like:
typedef __attribute__((__neon_vector_type__(4))) float __f32x4_t;
typedef __attribute__((__neon_vector_type__(2))) float __f32x2_t;
typedef __SVFloat32_t __sv_f32_t;
typedef __SVBool_t __sv_bool_t;
__f32x4_t _ZGVnN4v_cosf (__f32x4_t);
__f32x2_t _ZGVnN2v_cosf (__f32x2_t);
__sv_f32_t _ZGVsMxv_cosf (__sv_f32_t, __sv_bool_t);
#pragma omp declare variant(_ZGVnN4v_cosf) \
      match(construct = {simd(notinbranch, simdlen(4))}, device =
{isa("simd")})
#pragma omp declare variant(_ZGVnN2v_cosf) \
      match(construct = {simd(notinbranch, simdlen(2))}, device =
{isa("simd")})
#pragma omp declare variant(_ZGVsMxv_cosf) \
      match(construct = {simd(inbranch)}, device = {isa("sve")})
extern float cosf (float);

The BETA ABI can be found in the vfabia64 subdir of
https://github.com/ARM-software/abi-aa/
This currently disagrees with how this patch series implements 'omp
declare simd' for SVE and I also do not see a need for the 'omp declare
variant' scalable extension constructs. I will make changes to the ABI
once we've finalized the co-design of the ABI and this implementation.

I don't see a good reason for dropping the extension("scalable").
The problem is that since the base spec requires a simdlen clause,
GCC should in general raise an error if simdlen is omitted.
Where can you find this in the specs? I tried to find it but couldn't.

Leaving out simdlen in a 'omp declare simd' I assume is OK, our vector ABI defines behaviour for this. But I couldn't find what it meant for a omp declare variant, obviously can't be the same as for declare simd, as that is defined to mean 'define a set of clones' and only one clone can be associated to a declare variant.

But I'm not sure it makes sense to ignore -msve-vector-bits= when
compiling the SVE version (which is what patch 4 seems to do).
If someone compiles with -march=armv8.4-a, we'll use all Armv8.4-A
features in the Advanced SIMD routines.  Why should we ignore
SVE-related target information for the SVE routines?
Not sure I understand what you mean. The vector ABI defines that if a simdlen is omitted that (other than the NEON clones) a SVE VLA clone is available. So how would I take -msve-vector-bits into consideration? Do you mean I ought to add them as options to pass to the function so that it gets used when doing the codegen for the clone (if a function body is available)?

This is where things get a bit iffy for me though... We purposefully generate a SVE simdclone regardless of command-line options, just like x86 does, so why would these options affect simd clone generation but not the actual availability of SVE? Just seems a bit odd...

A viable alternative would be to rely on declare variant for such behaviour, where we could use function attributes to pass specific target options to the variant's prototype to be able to add more specific tuning options per variant. Not sure it will work but I can try it with my rebased patches at some point. I have to admit though, it is not a feature we are looking to use, so not sure it's worth the effort. The SVE simdclone codegen (with function bodies) is already pretty bad, so if we do believe there is a usecase for these, that might be something we should focus on before this sort of more specific tuning.

Of course, the fact that we take command-line options into account
means that omp simd/variant clauses on linkonce/comdat group functions
are an ODR violation waiting to happen.  But the same is true for the
original scalar functions that the clauses are attached to.
Can't find proper definitions of linkonce/comdat group functions so can't comment.


Thanks,
Richard

The patch series has three main steps:
1) Add SVE support for 'omp declare simd', see PR 96342
2) Enable GCC to use omp declare variants with simd constructs as simd
clones during auto-vectorization.
3) Add SLP support for vectorizable_simd_clone_call (This sounded like a
nice thing to add as we want to move away from non-slp vectorization).

Below you can see the list of current Patches/RFCs, the difference being
on how confident I am of the proposed changes. For the RFC I am hoping
to get early comments on the approach, rather than more indepth
code-reviews.

I appreciate we are still in Stage 4, so I can completely understand if
you don't have time to review this now, but I thought it can't hurt to
post these early.

Andre Vieira:
[PATCH] omp: Replace simd_clone_subparts with TYPE_VECTOR_SUBPARTS
[PATCH] parloops: Copy target and optimizations when creating a function
clone
[PATCH] parloops: Allow poly nit and bound
[RFC] omp, aarch64: Add SVE support for 'omp declare simd' [PR 96342]
[RFC] omp: Create simd clones from 'omp declare variant's
[RFC] omp: Allow creation of simd clones from omp declare variant with
-fopenmp-simd flag

Work in progress:
[RFC] vect: Enable SLP codegen for vectorizable_simd_clone_call

Reply via email to