https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681
--- Comment #7 from alalaw01 at gcc dot gnu.org --- Looking at where the peeling happens. In both -DFOO=0 and -DFOO=1 cases, 107.ch2 peels the inner loop header, so there is an i<=max test in the outer loop before the inner loop. However, in the -DFOO=1 case, this is dominated by the extra i>max test (that breaks out of the outer loop), so 110.dom2 removes the peeled i<=max. Thus, just before sccp, in the -DFOO=0 case, we have: <bb 3>: # i_25 = PHI <i_23(11), 1(2)> # j_26 = PHI <j_16(11), 0(2)> max_7 = 1 << j_26; if (max_7 >= i_25) goto <bb 4>; else goto <bb 5>; //skip inner loop <bb 4>: //inner loop header # i_2 = PHI <i_25(7), i_15(9)> _8 = (long unsigned int) i_2; _9 = _8 * 4; _11 = data_10(D) + _9; _12 = *_11; _13 = _12 + j_26; *_11 = _13; i_15 = i_2 + 1; if (max_7 >= i_15) goto <bb 4>; //cleaned, actually via latch else goto <bb 10>; note the inner loop exits if !(max_7 >= i_15), and when we hit the inner loop, we know that (max_7 >= i_25). Whereas in the -DFOO=1 case: <bb 2>: goto <bb 4>; <bb 3>: //in outer loop max_7 = 1 << j_17; if (max_7 < i_32) goto <bb 7>; else goto <bb 4>; <bb 4>: //outer loop header # max_24 = PHI <max_7(9), 1(2)> # i_22 = PHI <i_32(9), 1(2)> # j_23 = PHI <j_17(9), 0(2)> <bb 5>: //inner loop header # i_27 = PHI <i_22(4), i_16(10)> _8 = (long unsigned int) i_27; _9 = _8 * 4; _11 = data_10(D) + _9; _13 = *_11; _14 = _13 + j_23; *_11 = _14; i_16 = i_27 + 1; if (i_16 <= max_24) goto <bb 5>; //cleaned, actually via latch else goto <bb 6>; the inner loop exits if !(max_24 >= i_16), but max_24 is defined as PHI<max_7, 1>, and we only have that max_7<i_32 if we came round the outer loop, rather than jumping into the first iteration from bb 2. Hence the complex niter (i_22 + 1) + (i_22 <= max_24 ? (int) ((unsigned int) max_24 - (unsigned int) i_22) : 0) because i_22 <= max_24 has not obviously been tested. This structure is essentially created by dom2, when it jump-threads (?) the first "if (i>max) break" out of the loop, such that the outer loop now executes "if (i>max) break" after the inner loop (rather than testing "if (i>max) break" before the inner loop, as it still did following 107.ch2). So as an alternative, possibly tweaking the jump-threading/loop-peeling heuristics might help (?).