https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85747
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2021-12-27 Severity|normal |enhancement Ever confirmed|0 |1 Status|UNCONFIRMED |NEW --- Comment #9 from Andrew Pinski <pinskia at gcc dot gnu.org> --- size: 13-3, last_iteration: 13-3 Loop size: 13 Estimated size after unrolling: 40 Not unrolling loop 1: it is not innermost and code would grow. There are a few others like this one. Note LLVM is able even to handle: template <class It> constexpr void sort(It first, It last) { for (;first != last; ++first) { auto it = first; ++it; for (; it != last; ++it) { if (*it < *first) { auto tmp = *it; *it = *first; *first = tmp; } } } } static int generate() { int a[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21}; a[5] = 55; sort(a + 0, a + 21); return a[0] + a[6] + a[1] + a[2] + a[3] + a[4]; } I suspect the cost estimate it does for the loop is the removal of the load of a[i] knowing that a is fully written to.