Hi! On the following testcase, simplify_unary_operation is called on VEC_DUPLICATE from (vec_duplicate:V4SF something:SF) to V8SFmode, and simplify_unary_operation_1 tries an optimization usable for most unary operations, in particular it attempts to do (vec_duplicate:V8SF (unary:SF something:SF)) which is reasonable for all unary ops other than when unary is vec_duplicate, because that needs a vector outer mode and scalar or vector inner mode, not scalar outer and inner mode.
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2018-03-20 Jakub Jelinek <ja...@redhat.com> PR rtl-optimization/84989 * simplify-rtx.c (simplify_unary_operation_1): Don't try to simplify VEC_DUPLICATE with scalar result mode. * gcc.target/i386/pr84989.c: New test. --- gcc/simplify-rtx.c.jj 2018-01-20 10:52:47.000000000 +0100 +++ gcc/simplify-rtx.c 2018-03-20 17:13:11.906809795 +0100 @@ -1692,7 +1692,9 @@ simplify_unary_operation_1 (enum rtx_cod break; } - if (VECTOR_MODE_P (mode) && vec_duplicate_p (op, &elt)) + if (VECTOR_MODE_P (mode) + && vec_duplicate_p (op, &elt) + && code != VEC_DUPLICATE) { /* Try applying the operator to ELT and see if that simplifies. We can duplicate the result if so. --- gcc/testsuite/gcc.target/i386/pr84989.c.jj 2018-03-20 17:20:46.840921141 +0100 +++ gcc/testsuite/gcc.target/i386/pr84989.c 2018-03-20 17:19:57.257903317 +0100 @@ -0,0 +1,12 @@ +/* PR rtl-optimization/84989 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -mavx512f" } */ + +#include <x86intrin.h> + +__m512 +foo (float a, float *b) +{ + return _mm512_sub_ps (_mm512_broadcast_f32x4 (_mm_load_ps (b)), + _mm512_broadcast_f32x4 (_mm_set1_ps (a))); +} Jakub