https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91446
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |rguenth at gcc dot gnu.org --- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> --- Not sure what you are after, we are costing for the scalar cost 0x4ad73f0 width_2(D) 1 times scalar_store costs 6 in body 0x4ad73f0 height_4(D) 1 times scalar_store costs 6 in body 0x4ad73f0 x_6(D) 1 times scalar_store costs 6 in body 0x4ad73f0 y_8(D) 1 times scalar_store costs 6 in body and for the vectorized cost 0x4bd8330 width_2(D) 1 times vec_construct costs 8 in prologue 0x4bd8330 width_2(D) 1 times vector_store costs 16 in body 0x4bd8330 x_6(D) 1 times vec_construct costs 8 in prologue 0x4bd8330 x_6(D) 1 times vector_store costs 16 in body t.i:17:3: note: Cost model analysis: Vector inside of basic block cost: 32 Vector prologue cost: 16 Vector epilogue cost: 0 Scalar cost of basic block: 24 t.i:17:3: missed: not vectorized: vectorization is not profitable. I assume skylake-avx512 uses skylake_cost. The only issue I see is fixed use of [0]/[2] (SImode/SFmode) also because the cost tables do not have entries for DImode int_load/store. Skylake costs are odd here: {4, 4, 4}, /* cost of loading integer registers in QImode, HImode and SImode. Relative to reg-reg move (2). */ {6, 6, 3}, /* cost of storing integer registers */ Why is SImode store cost 3? (looks like "benchmark" random-number generator?) {6, 6, 6, 10, 20}, /* cost of loading SSE registers in 32,64,128,256 and 512-bit */ {6, 6, 6, 10, 20}, /* cost of unaligned loads. */ {8, 8, 8, 12, 24}, /* cost of storing SSE registers in 32,64,128,256 and 512-bit */ {8, 8, 8, 8, 16}, /* cost of unaligned stores. */ again, unaligned SSE stores are cheaper than aligned ones for 256 and 512 bits?! In the end the scalar code is not vectorized because of vector construction cost and because the vector store cost is higher than the scalar store cost which oddly is too cheap (3 vs. expected 6). So - INVALID?