On Thu, Jul 5, 2012 at 2:49 PM, Tom de Vries <tom_devr...@mentor.com> wrote: > On 04/07/12 19:02, Ulrich Weigand wrote: >> Any suggestions how to fix this? Should tail merging detect >> __builtin_unreachable and not merge such block? Or else, should >> the CFG optimizer be extended (how?) to handle unreachable blocks >> with multiple predecessors better? > > Ulrich, > > I extended the example as such: > ... > int > foo(int a) > { > if (a <= 0) > { > #ifdef MERGED > L1: > #endif > __builtin_unreachable(); > } > if (a > 2) > #ifdef MERGED > goto L1; > #else > __builtin_unreachable(); > #endif > return a > 0; > } > ... > > Indeed I can reproduce the problem: > ... > $ gcc unreachable.c -O2 -S -o- -ftree-tail-merge > foo: > .cfi_startproc > testl %edi, %edi > jle .L3 > cmpl $2, %edi > jg .L3 > movl $1, %eax > ret > .L3: > .cfi_endproc > ... > > And I can make the problem go away with -fno-tree-tail-merge: > ... > $ gcc unreachable.c -O2 -S -o- -fno-tree-tail-merge > foo: > .cfi_startproc > movl $1, %eax > ret > .cfi_endproc > ... > > But I can also reproduce the problem with -fno-tree-tail-merge by using > -DMERGED: > ,,, > $ gcc unreachable.c -O2 -S -o- -fno-tree-tail-merge -DMERGED > foo: > .cfi_startproc > testl %edi, %edi > jle .L3 > cmpl $2, %edi > jg .L3 > movl $1, %eax > ret > .L3: > .cfi_endproc > ,,, > > So it seems that this is a problem of a missed optimization, triggered by, but > not exclusively triggered by -ftree-tail-merge. > > How to fix the missed optimization, I'm not sure where it should be done. > > I think the optimization could be done in tree-vrp. Currently the assert > expressions look like this: > ... > .foo (intD.6 aD.1711) > { > _BoolD.1693 D.1720; > intD.6 D.1719; > > # BLOCK 2 freq:10000 > # PRED: ENTRY [100.0%] (fallthru,exec) > if (aD.1711_1(D) <= 0) > goto <bb 3>; > else > goto <bb 4>; > # SUCC: 3 [0.0%] (true,exec) 4 [100.0%] (false,exec) > > # BLOCK 3 freq:4 > # PRED: 2 [0.0%] (true,exec) > # VUSE <.MEMD.1722_4(D)> > # USE = nonlocal > # CLB = nonlocal > __builtin_unreachableD.997 (); > # SUCC: > > # BLOCK 4 freq:9996 > # PRED: 2 [100.0%] (false,exec) > aD.1711_6 = ASSERT_EXPR <aD.1711_1(D), aD.1711_1(D) > 0>; > if (aD.1711_6 > 2) > goto <bb 5>; > else > goto <bb 6>; > # SUCC: 5 [0.0%] (true,exec) 6 [100.0%] (false,exec) > > # BLOCK 5 freq:4 > # PRED: 4 [0.0%] (true,exec) > # VUSE <.MEMD.1722_4(D)> > # USE = nonlocal > # CLB = nonlocal > __builtin_unreachableD.997 (); > # SUCC: > > # BLOCK 6 freq:9992 > # PRED: 4 [100.0%] (false,exec) > aD.1711_5 = ASSERT_EXPR <aD.1711_6, aD.1711_6 <= 2>; > D.1720_2 = aD.1711_5 > 0; > D.1719_3 = (intD.6) D.1720_2; > # VUSE <.MEMD.1722_4(D)> > return D.1719_3; > # SUCC: EXIT [100.0%] > > } > .. > > The asserts allow the return result to be optimized, but not the cfg > conditions. > > AFAIU, we can insert the asserts earlier. F.i., we can insert > aD.1711_6 = ASSERT_EXPR <aD.1711_1(D), aD.1711_1(D) > 0> > before the GIMPLE_COND in bb2. > > Attached proof-of-concept patch implements this, and works for this example > both > with and without -DMERGED, independent of -ftree-tail-merge.
You don't need to do it this way - you can simply remove the CFG path feeding __builtin_unreachable. Putting the assert before the if looks a bit backward, so I'd prefer to not do that. > The only problem I can see with this approach is that doing this early (vrp1) > basically removes range annotations which might have had an effect later on. > It > could be and idea to only do this in vrp2, which is not much earlier than > expand, were the rtl cfg optimization kicks in for the first time. Indeed. Richard. > Thanks, > - Tom