https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67628
Bug ID: 67628 Summary: [tree-optimization] (a && b) && c shows better codegen than a && (b && c) Product: gcc Version: 6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ktkachov at gcc dot gnu.org Target Milestone: --- Consider the two functions: int foo1 (int a, int b, int c, int d) { return a > b && b <= c && c > d; } int foo2 (int a, int b, int c, int d) { return a > b && (b <= c && c > d); } On aarch64 foo1 generates: foo1: cmp w1, w2 ccmp w2, w3, 4, le ccmp w0, w1, 4, gt cset w0, gt ret but for foo2 generates: foo2: cmp w0, w1 ble .L4 cmp w1, w2 cset w1, le cmp w2, w3 cset w0, gt and w0, w1, w0 ret Something similar is observed on x86_64 where foo2 contains a conditional branch instruction where foo1 is a single basic block In foo2 we end up generating multiple basic blocks whereas for foo1 we manage to merge them all into 1 basic block which ends up going through the conditional-compare pass nicely. Looking at the final .optimized tree dump the foo1 tree is: _BoolD.2673 _1; _BoolD.2673 _4; _BoolD.2673 _6; _BoolD.2673 _10; _BoolD.2673 _11; intD.7 _12; ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot ;; prev block 0, next block 1, flags: (NEW, REACHABLE) ;; pred: ENTRY [100.0%] (FALLTHRU,EXECUTABLE) # RANGE [0, 1] _4 = a_2(D) > b_3(D); # RANGE [0, 1] _6 = b_3(D) <= c_5(D); # RANGE [0, 1] _10 = c_5(D) > d_8(D); # RANGE [0, 1] _1 = _6 & _10; # RANGE [0, 1] _11 = _1 & _4; # RANGE [0, 1] NONZERO 1 _12 = (intD.7) _11; # VUSE <.MEM_9(D)> return _12; ;; succ: EXIT [100.0%] whereas for foo2 it's more complex: intD.7 iftmp.0_1; _BoolD.2673 _5; _BoolD.2673 _7; _BoolD.2673 _8; intD.7 _10; _BoolD.2673 _11; ;; basic block 2, loop depth 0, count 0, freq 10000, maybe hot ;; prev block 0, next block 3, flags: (NEW, REACHABLE) ;; pred: ENTRY [100.0%] (FALLTHRU,EXECUTABLE) if (a_2(D) > b_3(D)) goto <bb 3>; else goto <bb 4>; ;; succ: 3 [50.0%] (TRUE_VALUE,EXECUTABLE) ;; 4 [50.0%] (FALSE_VALUE,EXECUTABLE) ;; basic block 3, loop depth 0, count 0, freq 5000, maybe hot ;; prev block 2, next block 4, flags: (NEW, REACHABLE) ;; pred: 2 [50.0%] (TRUE_VALUE,EXECUTABLE) # RANGE [0, 1] _5 = b_3(D) <= c_4(D); # RANGE [0, 1] _7 = c_4(D) > d_6(D); # RANGE [0, 1] _8 = _5 & _7; ;; succ: 4 [100.0%] (FALLTHRU,EXECUTABLE) ;; basic block 4, loop depth 0, count 0, freq 10000, maybe hot ;; prev block 3, next block 1, flags: (NEW, REACHABLE) ;; pred: 3 [100.0%] (FALLTHRU,EXECUTABLE) ;; 2 [50.0%] (FALSE_VALUE,EXECUTABLE) # _11 = PHI <_8(3), 0(2)> # RANGE [0, 1] NONZERO 1 iftmp.0_1 = (intD.7) _11; # VUSE <.MEM_9(D)> return iftmp.0_1; ;; succ: EXIT [100.0%] If we were to pick some kind of canonicalization for these equivalent expressions it would make life easier for later passes to generate consistent code.