https://gcc.gnu.org/bugzilla/show_bug.cgi?id=43174

--- Comment #3 from amker at gcc dot gnu.org ---
Note for the three levels of loop example, GCC chooses one IV for both j and k
loops, thus generates pretty clean output on x86_64 with O2.  

For the simple example, now gcc can eliminate comparison iv with the address
candidate, and generates below codes:

  <bb 2>:
  ivtmp.18_14 = (unsigned long) &a;
  _10 = &a + 60516;
  _28 = (unsigned long) _10;
  goto <bb 7>;

  <bb 3>:

  <bb 4>:
  # s_18 = PHI <s_9(3), s_19(7)>
  # ivtmp.9_12 = PHI <ivtmp.9_1(3), ivtmp.9_4(7)>
  _22 = (void *) ivtmp.9_12;
  _8 = MEM[base: _22, offset: 0B];
  s_9 = _8 + s_18;
  ivtmp.9_1 = ivtmp.9_12 + 4;
  if (ivtmp.9_1 != _27)
    goto <bb 3>;
  else
    goto <bb 5>;

  <bb 5>:
  # s_17 = PHI <s_9(4)>
  ivtmp.18_15 = ivtmp.18_5 + 492;
  if (ivtmp.18_15 != _28)
    goto <bb 6>;
  else
    goto <bb 8>;

  <bb 6>:

  <bb 7>:
  # s_19 = PHI <s_17(6), 0(2)>
  # ivtmp.18_5 = PHI <ivtmp.18_15(6), ivtmp.18_14(2)>
  ivtmp.9_4 = ivtmp.18_5;
  _29 = ivtmp.18_5 + 492;
  _27 = _29;
  goto <bb 4>;

  <bb 8>:
  # s_16 = PHI <s_17(5)>
  return s_16;

With this, following gimple optimizers are able to CSE the opportunity of
"ivtmp.18_5 + 492".  As a result, optimal code is generated as in optimized
dump:


  <bb 2>:
  ivtmp.18_14 = (unsigned long) &a;
  _28 = (unsigned long) &MEM[(void *)&a + 60516B];
  goto <bb 5>;

  <bb 3>:
  # s_18 = PHI <s_9(3), s_19(5)>
  # ivtmp.9_12 = PHI <ivtmp.9_1(3), ivtmp.18_5(5)>
  _22 = (void *) ivtmp.9_12;
  _8 = MEM[base: _22, offset: 0B];
  s_9 = _8 + s_18;
  ivtmp.9_1 = ivtmp.9_12 + 4;
  if (ivtmp.9_1 != _29)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 4>:
  if (_28 != _29)
    goto <bb 5>;
  else
    goto <bb 6>;

  <bb 5>:
  # s_19 = PHI <s_9(4), 0(2)>
  # ivtmp.18_5 = PHI <_29(4), ivtmp.18_14(2)>
  _29 = ivtmp.18_5 + 492;
  goto <bb 3>;

  <bb 6>:
  return s_9;

This in effect is the transformation you wanted in this PR, but I doubt if GCC
can do this if it can't eliminate the inner loop's comparison using ivtmp.9_1
at the first place.

On the other hand, for cases that we can use IV's final value, it maybe likely
for GCC to eliminate the comparison IV.

Reply via email to