On 27/06/2019 20:54, Steve Ellcey wrote: > I am testing the latest GCC with not-yet-submitted GLIBC changes that > implement libmvec on Aarch64. > > While trying to run SPEC 2017 (specifically 521.wrf_r) I ran into a > case where GCC was generating a call to _ZGVnN2vv_powf, that is a > vectorized powf call for 2 (not 4) elements. This was a problem > because I only implemented a 4 element 32 bit vectorized powf function > for libmvec and not a 2 element version. > > I think this is due to aarch64_simd_clone_compute_vecsize_and_simdlen > which allows for (element count * element size) to be either 64 > or 128. > > I would like some thoughts on what we should do about this, should > we require glibc/libmvec to provide 2 element 32 bit floating point > vector functions (as well as the 4 element ones) or should we change > aarch64_simd_clone_compute_vecsize_and_simdlen to only allow 4 > element (128 total bit size) vectors and not 2 element (64 total bit > size) ones?
in the "declare simd" syntax "simdlen" can specify a vector length and then only that will be used. so either you provide both 4 and 2 length powf or it has to be declared as #pragma omp declare simd simdlen(4) notinbranch float powf(float, float); now, we have the issue that this still allows vectorizing powf as an sve call in future compilers, to deal with that omp 5.0 has "declare variant" which can specify exactly one simd variant and its name: #pragma omp declare variant(_ZGVnN4vv_powf) \ match(construct={simd(simdlen(4), notinbranch)}, device={isa("simd")}) float powf(float, float); gcc does not support this currently and it will require a new attribute syntax too, the next vector abi update will specify this omp pragma and i will write something about how the attribute should work, unfortunately i don't see an easy way out of this. if there is a way to detect that the compiler only supports advsimd vectror calls and not sve calls then we could do #if supported_omp_version >= 5 #pragma omp declare variant(_ZGVnN4vv_powf) \ match(construct={simd(simdlen(4), notinbranch)}, device={isa("simd")}) #elif only_advsimd_vector_call_support #pragma omp declare simd simdlen(4) notinbranch #endif float powf(float, float); i don't know yet if we can do this reliably in glibc math.h (and we have a fortran declaration problem too). > > This is obviously a question for the pre-SVE vector instructions, > I am not sure how this would be handled in SVE. > > Steve Ellcey > sell...@marvell.com > > P.S. Here a test case in Fortran that generated the 2 element > vector call. It unrolled the loop into one vector call > of 2 elements and one scalar call. > > SUBROUTINE FOO(B,W,P) > REAL, DIMENSION (3) :: W, P > DO 10 I = 1, 3 > P(I) = W(I) ** B > 10 CONTINUE > END SUBROUTINE FOO >