On 27/06/2019 20:54, Steve Ellcey wrote:
> I am testing the latest GCC with not-yet-submitted GLIBC changes that
> implement libmvec on Aarch64.
> 
> While trying to run SPEC 2017 (specifically 521.wrf_r) I ran into a
> case where GCC was generating a call to _ZGVnN2vv_powf, that is a
> vectorized powf call for 2 (not 4) elements.  This was a problem
> because I only implemented a 4 element 32 bit vectorized powf function
> for libmvec and not a 2 element version.
> 
> I think this is due to aarch64_simd_clone_compute_vecsize_and_simdlen
> which allows for (element count * element size) to be either 64
> or 128.
> 
> I would like some thoughts on what we should do about this, should
> we require glibc/libmvec to provide 2 element 32 bit floating point
> vector functions (as well as the 4 element ones) or should we change
> aarch64_simd_clone_compute_vecsize_and_simdlen to only allow 4
> element (128 total bit size) vectors and not 2 element (64 total bit
> size) ones?

in the "declare simd" syntax "simdlen" can specify
a vector length and then only that will be used.

so either you provide both 4 and 2 length powf or
it has to be declared as

#pragma omp declare simd simdlen(4) notinbranch
float powf(float, float);

now, we have the issue that this still allows
vectorizing powf as an sve call in future compilers,
to deal with that omp 5.0 has "declare variant" which
can specify exactly one simd variant and its name:

#pragma omp declare variant(_ZGVnN4vv_powf) \
 match(construct={simd(simdlen(4), notinbranch)}, device={isa("simd")})
float powf(float, float);

gcc does not support this currently and it will
require a new attribute syntax too, the next vector
abi update will specify this omp pragma and i will
write something about how the attribute should work,
unfortunately i don't see an easy way out of this.

if there is a way to detect that the compiler only
supports advsimd vectror calls and not sve calls
then we could do

#if supported_omp_version >= 5
#pragma omp declare variant(_ZGVnN4vv_powf) \
 match(construct={simd(simdlen(4), notinbranch)}, device={isa("simd")})
#elif only_advsimd_vector_call_support
#pragma omp declare simd simdlen(4) notinbranch
#endif
float powf(float, float);

i don't know yet if we can do this reliably in
glibc math.h (and we have a fortran declaration
problem too).

> 
> This is obviously a question for the pre-SVE vector instructions,
> I am not sure how this would be handled in SVE.
> 
> Steve Ellcey
> sell...@marvell.com
> 
> P.S.  Here a test case in Fortran that generated the 2 element
>       vector call.  It unrolled the loop into one vector call
>       of 2 elements and one scalar call.
> 
>       SUBROUTINE FOO(B,W,P)
>       REAL, DIMENSION (3) :: W, P
>       DO 10 I = 1, 3
>       P(I) = W(I) ** B
> 10    CONTINUE
>       END SUBROUTINE FOO
> 

Reply via email to