https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #43084|0 |1 is obsolete| | --- Comment #30 from Richard Biener <rguenth at gcc dot gnu.org> --- Created attachment 43238 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43238&action=edit updated patch for SLP costing You are right, the patch contained several errors. The multiple_of_p is supposed to handle the case where we know all vectors will be equal, like when group_size is two and const_nunits is 4. The arguments were swapped. Also the loop over the elements were bogus. I've corrected this with the attached updated patch. This now costs two vector constructions for the testcase as expected but still: t.c:32:12: note: Cost model analysis: Vector inside of basic block cost: 32 Vector prologue cost: 64 Vector epilogue cost: 0 Scalar cost of basic block: 192 t.c:32:12: note: Basic block will be vectorized using SLP that's for two aligned stores and two 8 element vector constructions. We're offsetting 16 scalar stores after all... They each seem to cost 12 while an aligned vector store costs 16. And the vector constructions cost 32 each (8 times a SSE op costing 4 aka "element insert"). The only thing I notice is that 40240 ix86_vec_cost (machine_mode mode, int cost, bool parallel) 40241 { 40242 if (!VECTOR_MODE_P (mode)) 40243 return cost; 40244 40245 if (!parallel) 40246 return cost * GET_MODE_NUNITS (mode); 40247 if (GET_MODE_BITSIZE (mode) == 128 40248 && TARGET_SSE_SPLIT_REGS) 40249 return cost * 2; (gdb) 40250 if (GET_MODE_BITSIZE (mode) > 128 40251 && TARGET_AVX128_OPTIMAL) 40252 return cost * GET_MODE_BITSIZE (mode) / 128; 40253 return cost; 40254 } all the pessimizing for TARGET_SSE_SPLIT_REGS/TARGET_AVX128_OPTIMAL isn't applied to the !parallel case. But they wouldn't apply to AVX512 AFAICS.