[Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c)

ktkachov at gcc dot gnu.org Fri, 18 Sep 2015 07:33:02 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628


            Bug ID: 67628
           Summary: [tree-optimization] (a && b) && c shows better codegen
                    than a && (b && c)
           Product: gcc
           Version: 6.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: ktkachov at gcc dot gnu.org
  Target Milestone: ---

Consider the two functions:
int
foo1 (int a, int b, int c, int d)
{
  return a > b && b <= c && c > d;
}

int
foo2 (int a, int b, int c, int d)
{
  return a > b && (b <= c && c > d);
}

On aarch64 foo1 generates:
foo1:
        cmp     w1, w2
        ccmp    w2, w3, 4, le
        ccmp    w0, w1, 4, gt
        cset    w0, gt
        ret

but for foo2 generates:
foo2:
        cmp     w0, w1
        ble     .L4
        cmp     w1, w2
        cset    w1, le
        cmp     w2, w3
        cset    w0, gt
        and     w0, w1, w0
        ret


Something similar is observed on x86_64 where foo2 contains a conditional
branch instruction where foo1 is a single basic block

In foo2 we end up generating multiple basic blocks whereas for foo1 we manage
to merge them all into 1 basic block which ends up going through the
conditional-compare pass nicely.

Looking at the final .optimized tree dump the foo1 tree is:
  _BoolD.2673 _1;
  _BoolD.2673 _4;
  _BoolD.2673 _6;
  _BoolD.2673 _10;
  _BoolD.2673 _11;
  intD.7 _12;

;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
;;    prev block 0, next block 1, flags: (NEW, REACHABLE)
;;    pred:       ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  # RANGE [0, 1]
  _4 = a_2(D) > b_3(D);
  # RANGE [0, 1]
  _6 = b_3(D) <= c_5(D);
  # RANGE [0, 1]
  _10 = c_5(D) > d_8(D);
  # RANGE [0, 1]
  _1 = _6 & _10;
  # RANGE [0, 1]
  _11 = _1 & _4;
  # RANGE [0, 1] NONZERO 1
  _12 = (intD.7) _11;
  # VUSE <.MEM_9(D)>
  return _12;
;;    succ:       EXIT [100.0%] 




whereas for foo2 it's more complex:
  intD.7 iftmp.0_1;
  _BoolD.2673 _5;
  _BoolD.2673 _7;
  _BoolD.2673 _8;
  intD.7 _10;
  _BoolD.2673 _11;

;;   basic block 2, loop depth 0, count 0, freq 10000, maybe hot
;;    prev block 0, next block 3, flags: (NEW, REACHABLE)
;;    pred:       ENTRY [100.0%]  (FALLTHRU,EXECUTABLE)
  if (a_2(D) > b_3(D))
    goto <bb 3>;
  else
    goto <bb 4>;
;;    succ:       3 [50.0%]  (TRUE_VALUE,EXECUTABLE)
;;                4 [50.0%]  (FALSE_VALUE,EXECUTABLE)

;;   basic block 3, loop depth 0, count 0, freq 5000, maybe hot
;;    prev block 2, next block 4, flags: (NEW, REACHABLE)
;;    pred:       2 [50.0%]  (TRUE_VALUE,EXECUTABLE)
  # RANGE [0, 1]
  _5 = b_3(D) <= c_4(D);
  # RANGE [0, 1]
  _7 = c_4(D) > d_6(D);
  # RANGE [0, 1]
  _8 = _5 & _7;
;;    succ:       4 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 4, loop depth 0, count 0, freq 10000, maybe hot
;;    prev block 3, next block 1, flags: (NEW, REACHABLE)
;;    pred:       3 [100.0%]  (FALLTHRU,EXECUTABLE)
;;                2 [50.0%]  (FALSE_VALUE,EXECUTABLE)
  # _11 = PHI <_8(3), 0(2)>
  # RANGE [0, 1] NONZERO 1
  iftmp.0_1 = (intD.7) _11;
  # VUSE <.MEM_9(D)>
  return iftmp.0_1;
;;    succ:       EXIT [100.0%] 



If we were to pick some kind of canonicalization for these equivalent
expressions it would make life easier for later passes to generate consistent
code.

[Bug tree-optimization/67628] New: [tree-optimization] (a && b) && c shows better codegen than a && (b && c)

Reply via email to