https://bugs.llvm.org/show_bug.cgi?id=40265
Bug ID: 40265
Summary: autovectorization of repeated calls to vectorizable
functions fails
Product: clang
Version: 6.0
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: C++
Assignee: [email protected]
Reporter: [email protected]
CC: [email protected], [email protected],
[email protected], [email protected],
[email protected]
After consulting the documentation at
https://llvm.org/docs/Vectorizers.html
I tried to trigger autovectorization of loops with vectorizable functions. The
documentation gives a list of functions which are meant to be vectorized in the
section headed 'Vectorization of function calls', namely
pow exp exp2
sin cos sqrt
log log2 log10
fabs floor ceil
fma trunc nearbyint
fmuladd
I found that most of the listed functions are not autovectorized. Since some of
the functions (e.g. floor, ceil, trunc) are autovectorized, I was able to patch
the resulting assembler code, replacing the vector op-codes (vroundps to
vsqrtps, also adapting the argument pattern), and found that the resulting
binary was significantly faster and worked as intended (I exemplarily did this
for 'sqrt' on my AVX2 system and got about 400% speedup). So my guess is that
the autovectorization opportunity is simply missed - the code structure to
produce assembler code for the given loop pattern is obviously there and
functioning. The compiler does indeed state it is unable to vectorize. I was
using this test code:
#include <cmath>
extern float data [ 32768 ] ;
extern void vf1()
{
#pragma vectorize enable
for ( int i = 0 ; i < 32768 ; i++ )
data [ i ] = std::sqrt ( data [ i ] ) ;
}
and this compiler call:
clang++ -fvectorize -Rpass=loop-vectorize -Rpass-analysis=loop-vectorize
-std=c++11 -O3 -mavx2 -S -o sqrt.s sqrt_gcc.cc
resulting in these diagnosic messages:
/usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cmath:464:12:
remark: loop not vectorized: call instruction cannot be vectorized
[-Rpass-analysis=loop-vectorize]
{ return __builtin_sqrtf(__x); }
^
sqrt_gcc.cc:14:3: remark: loop not vectorized: read with atomic ordering or
volatile read [-Rpass-analysis=loop-vectorize]
for ( int i = 0 ; i < 32768 ; i++ )
using e.g. 'trunc' instead of 'sqrt' vectorizes correctly.
I did find an old thread here complaining about this behaviour:
http://clang-developers.42468.n3.nabble.com/Bug-with-vectorization-of-transcendental-functions-tc4041229.html#a4041291
but it seems that there was no conclusion, so I am submitting this bug report,
hoping to revive the topic.
With regards
Kay F. Jahnke
--
You are receiving this mail because:
You are on the CC list for the bug._______________________________________________
llvm-bugs mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs