[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

rguenth at gcc dot gnu.org Mon, 14 Jan 2019 02:24:42 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88828


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2019-01-14
                 CC|                            |rguenth at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think there's related bugs.  foo1 is optimized OK:

  y_4 = BIT_INSERT_EXPR <x_2(D), f_3(D), 0 (32 bits)>;
  return y_4;

while foo is expanded from

  <bb 2> [local count: 1073741824]:
  _1 = BIT_FIELD_REF <x_7(D), 32, 32>;
  _2 = BIT_FIELD_REF <x_7(D), 32, 64>;
  _3 = BIT_FIELD_REF <x_7(D), 32, 96>;
  y_6 = {f_5(D), _1, _2, _3};
  return y_6;

tree forwprop contains code pattern-matching on vector CONSTRUCTORs,
it could be extended to handle this case I think.  IIRC it can detect
arbitrary two-vector permutes already, for the above we could go
through an intermediate

  _1 = {f_5(D), f_5(D), ... };
  y_6 = VEC_PERM <_1, x_7(D), { .... }>;

and recognize permutes that only replace a single vector element.

So I think we should optimize

__v4sf
foo (__v4sf x, float f)
{
    __v4sf y =  __extension__ (__v4sf)
          { f, x[2], x[1], x[3] };
      return y;
}

as well, first permuting x and then inserting f (at any position).

[Bug tree-optimization/88828] Inefficient update of the first element of vector registers

Reply via email to