https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110630

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
   Last reconfirmed|                            |2023-07-12
     Ever confirmed|0                           |1
           Keywords|                            |missed-optimization
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
typedef float __attribute__((vector_size(32))) v8f32;

v8f32 f(v8f32 a, v8f32 b)
{
  /* Check that we vectorize this CTOR without any loads.  */
  return (v8f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3],
      a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]};
}

fails to optimally vectorize with SSE2 on x86_64 (would need AVX2).

It works OK when avoiding ABI issues like with the following so the
importance of fixing this might be low.

typedef float __attribute__((vector_size(32))) v8f32;
v8f32 a, b;
v8f32 res;
void f()
{
  /* Check that we vectorize this CTOR without any loads.  */
  res = (v8f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3],
      a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]};
}

the issue on x86_64 is that we run into

t.c:6:10: note:   vectorizing permutation op0[0] op0[1] op0[2] op0[3] op0[4]
op0[5] op0[6] op0[7]
t.c:6:10: note:   vectorizing permutation op0[0] op0[1] op0[2] op0[3] op0[4]
op0[5] op0[6] op0[7]
t.c:6:10: note:   as vops0[0][0] vops0[0][1] vops0[0][2] vops0[0][3],
vops0[0][4] vops0[0][5] vops0[0][6] vops0[0][7]
t.c:6:10: missed:   unsupported vect permute { 4 5 6 7 }
t.c:6:10: note:   Building vector operands of 0x47865f0 from scalars instead

the issue on mips with -mpaired-single is the same:

t.c:6:10: note:   vectorizing permutation op0[0] op0[1] op0[2] op0[3]
t.c:6:10: note:   vectorizing permutation op0[0] op0[1] op0[2] op0[3]
t.c:6:10: note:   as vops0[0][0] vops0[0][1], vops0[0][2] vops0[0][3]
t.c:6:10: missed:   unsupported vect permute { 2 3 }

but interestingly it doesn't emit any psABI warning so maybe it has a
defined ABI for the V4SFmode vectors.

The fix is to vectorizable_slp_permutation to try a vector extraction as well,
or for BLKmode vector operands simply allow this to go through.

Reply via email to