https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110630
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |ASSIGNED Last reconfirmed| |2023-07-12 Ever confirmed|0 |1 Keywords| |missed-optimization Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- typedef float __attribute__((vector_size(32))) v8f32; v8f32 f(v8f32 a, v8f32 b) { /* Check that we vectorize this CTOR without any loads. */ return (v8f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3], a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]}; } fails to optimally vectorize with SSE2 on x86_64 (would need AVX2). It works OK when avoiding ABI issues like with the following so the importance of fixing this might be low. typedef float __attribute__((vector_size(32))) v8f32; v8f32 a, b; v8f32 res; void f() { /* Check that we vectorize this CTOR without any loads. */ res = (v8f32){a[0] + b[0], a[1] + b[1], a[2] + b[2], a[3] + b[3], a[4] + b[4], a[5] + b[5], a[6] + b[6], a[7] + b[7]}; } the issue on x86_64 is that we run into t.c:6:10: note: vectorizing permutation op0[0] op0[1] op0[2] op0[3] op0[4] op0[5] op0[6] op0[7] t.c:6:10: note: vectorizing permutation op0[0] op0[1] op0[2] op0[3] op0[4] op0[5] op0[6] op0[7] t.c:6:10: note: as vops0[0][0] vops0[0][1] vops0[0][2] vops0[0][3], vops0[0][4] vops0[0][5] vops0[0][6] vops0[0][7] t.c:6:10: missed: unsupported vect permute { 4 5 6 7 } t.c:6:10: note: Building vector operands of 0x47865f0 from scalars instead the issue on mips with -mpaired-single is the same: t.c:6:10: note: vectorizing permutation op0[0] op0[1] op0[2] op0[3] t.c:6:10: note: vectorizing permutation op0[0] op0[1] op0[2] op0[3] t.c:6:10: note: as vops0[0][0] vops0[0][1], vops0[0][2] vops0[0][3] t.c:6:10: missed: unsupported vect permute { 2 3 } but interestingly it doesn't emit any psABI warning so maybe it has a defined ABI for the V4SFmode vectors. The fix is to vectorizable_slp_permutation to try a vector extraction as well, or for BLKmode vector operands simply allow this to go through.