Hi! All the vec_duplicate_p uses in simplify-rtx.c assume that it returns vector element if it returns true, and while that is the case for CONST_VECTOR which can't contain anything but scalars, VEC_DUPLICATE is documented: "This operation converts a scalar into a vector or a small vector into a larger one by duplicating the input values. The output vector mode must have the same submodes as the input vector mode or the scalar modes, and the number of output parts must be an integer multiple of the number of input parts." and e.g. on x86 that is heavily used in the backend. So, simplify-rtx.c checks something and expects to be given a scalar element, but it can return something different with a different (vector) mode instead.
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-01-05 Jakub Jelinek <ja...@redhat.com> PR rtl-optimization/83682 * rtl.h (const_vec_duplicate_p): Only return true for VEC_DUPLICATE if it has non-VECTOR_MODE element mode. (vec_duplicate_p): Likewise. * gcc.target/i386/pr83682.c: New test. --- gcc/rtl.h.jj 2018-01-04 00:43:14.704702340 +0100 +++ gcc/rtl.h 2018-01-05 16:03:52.420823342 +0100 @@ -2969,7 +2969,9 @@ const_vec_duplicate_p (T x, T *elt) *elt = CONST_VECTOR_ENCODED_ELT (x, 0); return true; } - if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE) + if (GET_CODE (x) == CONST + && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE + && !VECTOR_MODE_P (GET_MODE (XEXP (XEXP (x, 0), 0)))) { *elt = XEXP (XEXP (x, 0), 0); return true; @@ -2984,7 +2986,8 @@ template <typename T> inline bool vec_duplicate_p (T x, T *elt) { - if (GET_CODE (x) == VEC_DUPLICATE) + if (GET_CODE (x) == VEC_DUPLICATE + && !VECTOR_MODE_P (GET_MODE (XEXP (x, 0)))) { *elt = XEXP (x, 0); return true; --- gcc/testsuite/gcc.target/i386/pr83682.c.jj 2018-01-05 16:06:57.299885887 +0100 +++ gcc/testsuite/gcc.target/i386/pr83682.c 2018-01-05 16:06:26.630875486 +0100 @@ -0,0 +1,17 @@ +/* PR rtl-optimization/83682 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ + +typedef float V __attribute__((__vector_size__(16))); +typedef double W __attribute__((__vector_size__(16))); +V b; +W c; + +void +foo (void *p) +{ + V e = __builtin_ia32_cvtsd2ss (b, c); + V g = e; + float f = g[0]; + __builtin_memcpy (p, &f, sizeof (f)); +} Jakub