https://gcc.gnu.org/bugzilla/show_bug.cgi?id=48037
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |missed-optimization Severity|normal |enhancement --- Comment #11 from Andrew Pinski <pinskia at gcc dot gnu.org> --- The last part of this is SLP of sqrt. I don't know if it is because we don't do SLP for functions or we don't treat vector constructors as store points. Note clang/LLVM can do SLP on x86 (but not on aarch64) and get the vectorized sqrt. Here is a generic testcase where we should do the SLP: #include <math.h> typedef __attribute__((vector_size(16))) double __m128d; static inline __m128d _mm_set_pd(double a, double b) { return (__m128d){b,a}; } __m128d vsqrt1 (__m128d const& x) { double const* __restrict__ const y = (double const*)&x; double const a = sqrt(y[0]); double const b = sqrt(y[1]); return _mm_set_pd(b,a); } __m128d vsqrt2 (__m128d const x) { double const* __restrict__ const y = (double const*)&x; double const a = sqrt(y[0]); double const b = sqrt(y[1]); return _mm_set_pd(b,a); }