https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121536
Tamar Christina <tnfchris at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Last reconfirmed| |2025-08-13 CC| |tnfchris at gcc dot gnu.org Status|UNCONFIRMED |ASSIGNED Assignee|unassigned at gcc dot gnu.org |tnfchris at gcc dot gnu.org --- Comment #1 from Tamar Christina <tnfchris at gcc dot gnu.org> --- The main loop originally picks VNx16QI and the costing of them looks the same: : note: Original vector body cost = 6 : note: Vector loop iterates at most 0 times : note: Scalar issue estimate: : note: load operations = 0 : note: store operations = 0 : note: general operations = 2 : note: reduction latency = 2 : note: estimated min cycles per iteration = 2.000000 : note: estimated cycles per vector iteration (for VF 2) = 4.000000 : note: SVE issue estimate: : note: load operations = 0 : note: store operations = 0 : note: general operations = 2 : note: predicate operations = 2 : note: reduction latency = 4 : note: estimated cycles per iteration to rename = 1.000000 : note: estimated min cycles per iteration without predication = 4.000000 : note: estimated min cycles per iteration for predication = 1.000000 : note: estimated min cycles per iteration = 4.000000 : note: Cost model analysis: Vector inside of loop cost: 6 Vector prologue cost: 2 Vector epilogue cost: 6 Scalar iteration cost: 2 Scalar outside cost: 2 Vector outside cost: 8 prologue iterations: 0 epilogue iterations: 0 vs : note: Original vector body cost = 6 : note: Vector loop iterates at most 0 times : note: Scalar issue estimate: : note: load operations = 0 : note: store operations = 0 : note: general operations = 2 : note: reduction latency = 2 : note: estimated min cycles per iteration = 2.000000 : note: estimated cycles per vector iteration (for VF 2) = 4.000000 : note: SVE issue estimate: : note: load operations = 0 : note: store operations = 0 : note: general operations = 2 : note: predicate operations = 2 : note: reduction latency = 4 : note: estimated cycles per iteration to rename = 1.000000 : note: estimated min cycles per iteration without predication = 4.000000 : note: estimated min cycles per iteration for predication = 1.000000 : note: estimated min cycles per iteration = 4.000000 : note: Cost model analysis: Vector inside of loop cost: 6 Vector prologue cost: 2 Vector epilogue cost: 6 Scalar iteration cost: 4 Scalar outside cost: 2 Vector outside cost: 8 prologue iterations: 0 epilogue iterations: 0 Minimum number of vector iterations: 4 Calculated minimum iters for profitability: 8 but the conclusion is different. The old code still thinks the vector loop iterates 4 times whereas the new one thinks it doesn't... But this seems like a backend issue so I'll take a look. Mine.