Re: [AutoFDO] Profile merging for clone test

Jan Hubicka Tue, 10 Jun 2025 01:33:54 -0700

> Hi,
> 
> > On 4 Jun 2025, at 9:53 pm, Jan Hubicka <hubi...@ucw.cz> wrote:
> > 
> > External email: Use caution opening links or attachments
> > 
> > 
> >> This patch introduces a new testcase to verify the merging of profiles
> >> is performed for cloned functions.
> >> 
> >> Since this is invoked very early, before the pass manager, we need to
> >> set up the dumping explicitly. This is similar to the handling in
> >> finish_optimization_passes.
> >> 
> >> gcc/ChangeLog:
> >> 
> >>        * auto-profile.cc (autofdo_source_profile::read): Dump message
> >>        while merging profile.
> >>        * pass_manager.h (get_pass_auto_profile): New.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 
> >>        * gcc.dg/tree-prof/clone-merge-1.c: New test.
> >> 
> >> Is this OK?
> > OK, thanks!
> > It would be nice to also test that some afdo based optimization happens
> > (i.e. have indirect call in the cloned function or so), so we check that
> > the merged data actually works, but having a dump and check that
> > something is done is already good :)
> > 
> > BTW I now see main remaining spec2k17 regresions in omnetpp and
> > imagemagick (small regressions are also in deepsjeng, leela, namd).
> > I plan to investigate them, or do you happen to know what is goind on
> > there?
> > 
> > I still test with LTO and post-AFDO cloning disabled. Newly I noticed
> > that the create_gcov tool seems to mishandle partitioned functions, so I
> > also disable function partitioning (which is not on by default on ARM,
> > but I think it would be useful to get it working).
> 
> Thanks! Do you log your results publicly as you do with other spec2017 runs?
> 
> I tried running spec2017 to get some latest numbers but hittting assertion 
> with `profile_count::compatible_p`. Should we make GUESSED_LOCAL count too to 
> AFDO type
> or drop it to zero.
> 
> Here is an example ICE:
> 
> Cactus/piraha/Bracket.cc:133:1: internal compiler error: in 
> combine_with_ipa_count_within, at profile-count.cc:414
> 133 | }
> | ^
> 0x25f995b internal_error(char const*, ...)
> ../../gcc/gcc/diagnostic-global-context.cc:517
> 0x8589eb fancy_abort(char const*, int, char const*)
> ../../gcc/gcc/diagnostic.cc:1803
> 0x1269533 profile_count::combine_with_ipa_count_within(profile_count, 
> profile_count)
> ../../gcc/gcc/profile-count.cc:414
> 0xdbd83b cgraph_node::create_clone(tree_node*, profile_count, bool, 
> vec<cgraph_edge*, va_heap, vl_ptr>, bool, cgraph_node*, 
> ipa_param_adjustments*, char const*)
> ../../gcc/gcc/cgraphclones.cc:410
> 0x108603f clone_inlined_nodes(cgraph_edge*, bool, bool, int*)
> ../../gcc/gcc/ipa-inline-transform.cc:224
> 0x10860a3 clone_inlined_nodes(cgraph_edge*, bool, bool, int*)
> ../../gcc/gcc/ipa-inline-transform.cc:248
> 0x1086947 inline_call(cgraph_edge*, bool, vec<cgraph_edge*, va_heap, 
> vl_ptr>*, int*, bool, bool*)
> ../../gcc/gcc/ipa-inline-transform.cc:516
> 0x23fe81b inline_small_functions
> ../../gcc/gcc/ipa-inline.cc:2434
> 0x23fe81b ipa_inline
> ../../gcc/gcc/ipa-inline.cc:2916
> 0x23fe81b execute
> ../../gcc/gcc/ipa-inline.cc:3314 <http://ipa-inline.cc:3314/>
> 
> --- a/gcc/auto-profile.cc
> +++ b/gcc/auto-profile.cc
> @@ -1119,7 +1119,8 @@ update_count_by_afdo_count (profile_count *count, 
> gcov_type c)
> *count = profile_count::from_gcov_type (c).afdo ();
> /* In case we have guessed profile which is already zero, preserve
> quality info. */
> - else if (count->nonzero_p ()
> + else if (!count->nonzero_p ()
> + || count->quality () == GUESSED_LOCAL
> || count->quality () == GUESSED)
> *count = profile_count::zero ().afdo ();
> }
Sorry for the breakage. I just noticed that independently too
(I used release checking compiler for CPU build and did not see this
before) and pushed the following fix.


The patch also improves handling of edge probabilities to make edges
staticaly predicted as 0 (such as EH edges or longjmp ones) to be
predicted reliably if auto-fdo confirms that and similarly for edges
with 100% probability.  This is a common case of single non-0 edge out
of BB.

I hope Petr will set up regular LNT CPU2017 tester (on zen5 CPU) today
or tomorrow. I re-benchmarked CPU2017 with -march=native -flto -Ofast.
For train run I now disable lto, ipa-icf and function partitioning,
since they confuse profile noticeably.  Still my results are worse then
without FDO:

500.perlbench_r       1    158          10.1   *       1    151          10.5   
*
502.gcc_r                                     NR                               
NR
505.mcf_r             1    185           8.74  *       1    193           8.38  
*
520.omnetpp_r         1    184           7.14  *       1    193           6.78  
*
523.xalancbmk_r                               NR                               
NR
525.x264_r            1     85.3        20.5   *       1     85.8        20.4   
*
531.deepsjeng_r       1    163           7.01  *       1    178           6.42  
*
541.leela_r           1    271           6.12  *       1    292           5.68  
*
548.exchange2_r       1     86.0        30.5   *       1    108          24.3   
*
557.xz_r              1    226           4.78  *       1    230           4.69  
*
 Est. SPECrate2017_int_base              9.73
 Est. SPECrate2017_int_peak                                               9.19

exchange is likely due to ipa-cp cloning heuristics, but I did not look
into omnetpp or deepsjeng yet.  exchange is quite special, but it would
be nice to understand omnetpp and deepsjeng.

gcc/ChangeLog:

        * auto-profile.cc (update_count_by_afdo_count): Fix handling
        of GUESSED_LOCAL.
        (afdo_calculate_branch_prob): Preserve static profile for
        probabilities 0 and 1.

diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc
index 215dadf87c2..91b1e97f5bb 100644
--- a/gcc/auto-profile.cc
+++ b/gcc/auto-profile.cc
@@ -1120,7 +1120,8 @@ update_count_by_afdo_count (profile_count *count, 
gcov_type c)
   /* In case we have guessed profile which is already zero, preserve
      quality info.  */
   else if (count->nonzero_p ()
-          || count->quality () == GUESSED)
+          || count->quality () == GUESSED
+          || count->quality () == GUESSED_LOCAL)
     *count = profile_count::zero ().afdo ();
 }
 
@@ -1515,8 +1516,21 @@ afdo_calculate_branch_prob (bb_set *annotated_bb)
     if (num_unknown_succ == 0 && total_count.nonzero_p ())
       {
        FOR_EACH_EDGE (e, ei, bb->succs)
-         e->probability
-           = AFDO_EINFO (e)->get_count ().probability_in (total_count);
+         {
+           /* If probability is 1, preserve reliable static prediction
+              (This is, for example the case of single fallthru edge
+               or single fallthru plus unlikely EH edge.)  */
+           if (AFDO_EINFO (e)->get_count () == total_count
+               && e->probability == profile_probability::always ())
+             ;
+           else if (AFDO_EINFO (e)->get_count ().nonzero_p ())
+             e->probability
+               = AFDO_EINFO (e)->get_count ().probability_in (total_count);
+           /* If probability is zero, preserve reliable static prediction.  */
+           else if (e->probability.nonzero_p ()
+                    || e->probability.quality () == GUESSED)
+             e->probability = profile_probability::never ().afdo ();
+         }
       }
   }
   FOR_ALL_BB_FN (bb, cfun)

Re: [AutoFDO] Profile merging for clone test

Reply via email to