https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105053

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
One notable difference is that the first loop is detected to require peeling
for gaps while the second one is not (probably an artifact of the low trip
count).
The second is that the first loop is detected as reduction path while the
second one as reduction chain.

OK, so I think I see what goes wrong.  We elided the load permutation but
the load is still biased wrongly.

  vectp.67_112 = _93 + 8;

  <bb 12> [local count: 405853744]:
  # i_98 = PHI <i_44(21), 0(11)>
  # prephitmp_7 = PHI <prephitmp_97(21), 0(11)>
  # ivtmp_31 = PHI <ivtmp_37(21), 4(11)>
  # vectp.66_105 = PHI <vectp.66_68(21), vectp.67_112(11)>
  # vect_prephitmp_7.71_61 = PHI <vect__26.72_62(21), { 0, 0, 0, 0 }(11)>
  # ivtmp_58 = PHI <ivtmp_81(21), 0(11)>
  _3 = (long unsigned int) i_98;
  _59 = _3 * 16;
  _60 = _93 + _59;
  _106 = MEM <vector(2) int> [(const int &)vectp.66_105];
  vect__54.68_113 = {_106, { 0, 0 }};
  vectp.66_95 = vectp.66_105 + 16;
  _89 = MEM <vector(2) int> [(const int &)vectp.66_95];
  vect__54.69_90 = {_89, { 0, 0 }};
  vect__51.70_78 = VEC_PERM_EXPR <vect__54.68_113, vect__54.69_90, { 0, 1, 4, 5
}>;

possibly because the SLP representative is unchanged when we transform

t.C:17:16: note:   node 0x3382280 (max_nunits=4, refcnt=2) const vector(4) int
t.C:17:16: note:   op template: _54 = MEM[(const int &)_60 + 12];
t.C:17:16: note:        stmt 0 _54 = MEM[(const int &)_60 + 12];
t.C:17:16: note:        stmt 1 _51 = MEM[(const int &)_60 + 8];
t.C:17:16: note:        load permutation { 1 0 }

into

t.C:17:16: note:   node 0x3382280 (max_nunits=4, refcnt=1) const vector(4) int
t.C:17:16: note:   op template: _54 = MEM[(const int &)_60 + 12];
t.C:17:16: note:        stmt 0 _51 = MEM[(const int &)_60 + 8];
t.C:17:16: note:        stmt 1 _54 = MEM[(const int &)_60 + 12];
t.C:17:16: note:        load permutation { 0 1 }

during SLP optimize.

Reply via email to