https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037
--- Comment #23 from amker at gcc dot gnu.org --- (In reply to Richard Biener from comment #21) > So after r257453 we improve the situation pre-IVOPTs to just > 6 IVs (duplicated but trivially equivalent) plus one counting IV. But then > when SLP is enabled IVOPTs comes along and adds another 4 IVs which makes us > spill... (for AVX256, so you need -march=core-avx2 for example). > > Bin, any chance you can take a look? In the IVO dump I see > > target_avail_regs 15 > target_clobbered_regs 9 > target_reg_cost 4 > target_spill_cost 8 > regs_used 3 > ^^^ > > and regs_used looks awfully low to me. The loop has even more IVs initially > plus variable steps for that IVs which means we need two regs per IV. > > There doesn't seem to be a way to force IVOPTs to use the minimal set of IVs? > Or just use the original set, removing the obvious redundancies? There is > a microarchitectural issue left with the vectorization but the spilling > obscures the look quite a bit :/ Sure, I will have a look based on your commit. Thanks