https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98167

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Target|x86_64 i?86                 |x86_64-*-* i?86-*-*
           Keywords|                            |missed-optimization
   Last reconfirmed|                            |2020-12-07
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Btw, vector lowering performs this optimization.  But then in GIMPLE we have

__m128 f (__m128 a, __m128 b)
{
  vector(4) float _3;
  vector(4) float _5;
  vector(4) float _6;

  <bb 2> [local count: 1073741824]:
  _3 = __builtin_ia32_shufps (b_2(D), b_2(D), 0);
  _5 = __builtin_ia32_shufps (a_4(D), a_4(D), 0);
  _6 = _3 * _5;
  return _6;

so we don't actually see the operation.  To rectify this the backend would
need to GIMPLE-fold those calls once the MAKS argument becomes constant.
Fold it to VEC_PERM_EXPR of VIEW_CONVERTed operands, that is.

Vector lowering doesn't perform generic permute optimizations, the vectorizer
does but it doesn't touch existing code.  I guess it could be done in some
new pass similar to backprop (but dataflow is the other way around).

Reply via email to