[Bug tree-optimization/95845] Failure to optimize vector load made in separate operations to single load

rguenth at gcc dot gnu.org Tue, 23 Jun 2020 23:52:31 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95845


Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org
   Last reconfirmed|                            |2020-06-24
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
So this is

  VIEW_CONVERT_EXPR<float[2]>(r)[0] = *ptr;
  VIEW_CONVERT_EXPR<float[2]>(r)[1] = *(ptr + 4);

from the FEs and

  _1 = *ptr_4(D);
  r_6 = BIT_INSERT_EXPR <r_5(D), _1, 0>;
  _2 = MEM[(const float *)ptr_4(D) + 4B];
  r_7 = BIT_INSERT_EXPR <r_6, _2, 32>;

after SSA rewrite.  There's no further combining of inserts happening,
I guess forwprop might want to see whether an insert chain forms a full
CTOR.  BB vectorization might also a candidate to look at but it would
be quite late.

The issue with forwprop is to somehow avoid quadraticness in searching
the chain which will be difficult given it's structure.  One possibility
would be to perform a forward search from BIT_INSERT_EXPRs with a
default def arg and mark the last BIT_INSERT_EXPR in a single-use chain
as to be processed.  Or declare it not a problem.

[Bug tree-optimization/95845] Failure to optimize vector load made in separate operations to single load

Reply via email to