https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545
--- Comment #17 from davidxl <xinliangli at gmail dot com> --- (In reply to Jan Hubicka from comment #16) > I have moved tracer before the late cleanups that seems to be rather obbious > thing to do. This lets us to optimize the testcase (with -O2): > int main() () > { > struct A * ap; > int i; > int _6; > > <bb 2>: > > <bb 3>: > # i_29 = PHI <i_22(6), 0(2)> > _6 = i_29 % 7; > if (_6 == 0) > goto <bb 4>; > else > goto <bb 5>; > > <bb 4>: > ap_8 = operator new (16); > ap_8->i = 0; > ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B]; > goto <bb 6>; > > <bb 5>: > ap_13 = operator new (16); > MEM[(struct B *)ap_13].D.2244.i = 0; > MEM[(struct B *)ap_13].b = 0; > MEM[(struct B *)ap_13].D.2244._vptr.A = &MEM[(void *)&_ZTV1B + 16B]; > > <bb 6>: > # ap_4 = PHI <ap_13(5), ap_8(4)> > operator delete (ap_4); > i_22 = i_29 + 1; > if (i_22 != 10000) > goto <bb 3>; > else > goto <bb 7>; > > <bb 7>: > return 0; > > } > > Martin, I do not have SPEC setup, do you think you can benchmark the > attached patch with SPEC and profile feedback and also non-FDO -O3 -ftracer > compared to -O3, please? > It would be nice to know code size impact, too. > Index: passes.def > =================================================================== > --- passes.def (revision 215651) > +++ passes.def (working copy) > @@ -155,6 +155,7 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_dce); > NEXT_PASS (pass_call_cdce); > NEXT_PASS (pass_cselim); > + NEXT_PASS (pass_tracer); > NEXT_PASS (pass_copy_prop); > NEXT_PASS (pass_tree_ifcombine); > NEXT_PASS (pass_phiopt); > @@ -252,7 +253,6 @@ along with GCC; see the file COPYING3. > NEXT_PASS (pass_cse_reciprocals); > NEXT_PASS (pass_reassoc); > NEXT_PASS (pass_strength_reduction); > - NEXT_PASS (pass_tracer); > NEXT_PASS (pass_dominator); > NEXT_PASS (pass_strlen); > NEXT_PASS (pass_vrp); > > Doing it at same approximately the same place as loop header copying seems > to make most sense to me. It benefits from early cleanups and DCE definitly > and it should enable more fun with the later scalar passes that are almost > all rerun then. WE can try some internal benchmarks with this change too. David