https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67681

--- Comment #7 from alalaw01 at gcc dot gnu.org ---
Looking at where the peeling happens. In both -DFOO=0 and -DFOO=1 cases,
107.ch2 peels the inner loop header, so there is an i<=max test in the outer
loop before the inner loop. However, in the -DFOO=1 case, this is dominated by
the extra i>max test (that breaks out of the outer loop), so 110.dom2 removes
the peeled i<=max.

Thus, just before sccp, in the -DFOO=0 case, we have:

  <bb 3>:
  # i_25 = PHI <i_23(11), 1(2)>
  # j_26 = PHI <j_16(11), 0(2)>
  max_7 = 1 << j_26;
  if (max_7 >= i_25)
    goto <bb 4>;
  else
    goto <bb 5>; //skip inner loop

  <bb 4>: //inner loop header
  # i_2 = PHI <i_25(7), i_15(9)>
  _8 = (long unsigned int) i_2;
  _9 = _8 * 4;
  _11 = data_10(D) + _9;
  _12 = *_11;
  _13 = _12 + j_26;
  *_11 = _13;
  i_15 = i_2 + 1;
  if (max_7 >= i_15)
    goto <bb 4>; //cleaned, actually via latch
  else
    goto <bb 10>;

note the inner loop exits if !(max_7 >= i_15), and when we hit the inner loop,
we know that (max_7 >= i_25). Whereas in the -DFOO=1 case:
  <bb 2>:
  goto <bb 4>;

  <bb 3>: //in outer loop
  max_7 = 1 << j_17;
  if (max_7 < i_32)
    goto <bb 7>;
  else
    goto <bb 4>;

  <bb 4>: //outer loop header
  # max_24 = PHI <max_7(9), 1(2)>
  # i_22 = PHI <i_32(9), 1(2)>
  # j_23 = PHI <j_17(9), 0(2)>

  <bb 5>: //inner loop header
  # i_27 = PHI <i_22(4), i_16(10)>
  _8 = (long unsigned int) i_27;
  _9 = _8 * 4;
  _11 = data_10(D) + _9;
  _13 = *_11;
  _14 = _13 + j_23;
  *_11 = _14;
  i_16 = i_27 + 1;
  if (i_16 <= max_24)
    goto <bb 5>; //cleaned, actually via latch
  else
    goto <bb 6>;

the inner loop exits if !(max_24 >= i_16), but max_24 is defined as PHI<max_7,
1>, and we only have that max_7<i_32 if we came round the outer loop, rather
than jumping into the first iteration from bb 2. Hence the complex niter

(i_22 + 1) + (i_22 <= max_24 ? (int) ((unsigned int) max_24 - (unsigned int)
i_22) : 0)

because i_22 <= max_24 has not obviously been tested.

This structure is essentially created by dom2, when it jump-threads (?) the
first "if (i>max) break" out of the loop, such that the outer loop now executes
"if (i>max) break" after the inner loop (rather than testing "if (i>max) break"
before the inner loop, as it still did following 107.ch2). So as an
alternative, possibly tweaking the jump-threading/loop-peeling heuristics might
help (?).

Reply via email to