https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35545

--- Comment #17 from davidxl <xinliangli at gmail dot com> ---
(In reply to Jan Hubicka from comment #16)
> I have moved tracer before the late cleanups that seems to be rather obbious
> thing to do. This lets us to optimize the testcase (with -O2):
> int main() ()
> {
>   struct A * ap;
>   int i;
>   int _6;
> 
>   <bb 2>:
> 
>   <bb 3>:
>   # i_29 = PHI <i_22(6), 0(2)>
>   _6 = i_29 % 7;
>   if (_6 == 0)
>     goto <bb 4>;
>   else
>     goto <bb 5>;
> 
>   <bb 4>:
>   ap_8 = operator new (16);
>   ap_8->i = 0;
>   ap_8->_vptr.A = &MEM[(void *)&_ZTV1A + 16B];
>   goto <bb 6>;
> 
>   <bb 5>:
>   ap_13 = operator new (16);
>   MEM[(struct B *)ap_13].D.2244.i = 0;
>   MEM[(struct B *)ap_13].b = 0;
>   MEM[(struct B *)ap_13].D.2244._vptr.A = &MEM[(void *)&_ZTV1B + 16B];
> 
>   <bb 6>:
>   # ap_4 = PHI <ap_13(5), ap_8(4)>
>   operator delete (ap_4);
>   i_22 = i_29 + 1;
>   if (i_22 != 10000)
>     goto <bb 3>;
>   else
>     goto <bb 7>;
> 
>   <bb 7>:
>   return 0;
> 
> }
> 
> Martin, I do not have SPEC setup, do you think you can benchmark the
> attached patch with SPEC and profile feedback and also non-FDO -O3 -ftracer
> compared to -O3, please?
> It would be nice to know code size impact, too.
> Index: passes.def
> ===================================================================
> --- passes.def  (revision 215651)
> +++ passes.def  (working copy)
> @@ -155,6 +155,7 @@ along with GCC; see the file COPYING3.
>        NEXT_PASS (pass_dce);
>        NEXT_PASS (pass_call_cdce);
>        NEXT_PASS (pass_cselim);
> +      NEXT_PASS (pass_tracer);
>        NEXT_PASS (pass_copy_prop);
>        NEXT_PASS (pass_tree_ifcombine);
>        NEXT_PASS (pass_phiopt);
> @@ -252,7 +253,6 @@ along with GCC; see the file COPYING3.
>        NEXT_PASS (pass_cse_reciprocals);
>        NEXT_PASS (pass_reassoc);
>        NEXT_PASS (pass_strength_reduction);
> -      NEXT_PASS (pass_tracer);
>        NEXT_PASS (pass_dominator);
>        NEXT_PASS (pass_strlen);
>        NEXT_PASS (pass_vrp);
> 
> Doing it at same approximately the same place as loop header copying seems
> to make most sense to me.  It benefits from early cleanups and DCE definitly
> and it should enable more fun with the later scalar passes that are almost
> all rerun then.


WE can try some internal benchmarks with this change too.

David

Reply via email to