https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Target|x86_64 i?86 |x86_64-*-* i?86-*-* Keywords| |missed-optimization Last reconfirmed| |2020-12-07 Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- Btw, vector lowering performs this optimization. But then in GIMPLE we have __m128 f (__m128 a, __m128 b) { vector(4) float _3; vector(4) float _5; vector(4) float _6; <bb 2> [local count: 1073741824]: _3 = __builtin_ia32_shufps (b_2(D), b_2(D), 0); _5 = __builtin_ia32_shufps (a_4(D), a_4(D), 0); _6 = _3 * _5; return _6; so we don't actually see the operation. To rectify this the backend would need to GIMPLE-fold those calls once the MAKS argument becomes constant. Fold it to VEC_PERM_EXPR of VIEW_CONVERTed operands, that is. Vector lowering doesn't perform generic permute optimizations, the vectorizer does but it doesn't touch existing code. I guess it could be done in some new pass similar to backprop (but dataflow is the other way around).