> Hi, > > > On 4 Jun 2025, at 9:53 pm, Jan Hubicka <hubi...@ucw.cz> wrote: > > > > External email: Use caution opening links or attachments > > > > > >> This patch introduces a new testcase to verify the merging of profiles > >> is performed for cloned functions. > >> > >> Since this is invoked very early, before the pass manager, we need to > >> set up the dumping explicitly. This is similar to the handling in > >> finish_optimization_passes. > >> > >> gcc/ChangeLog: > >> > >> * auto-profile.cc (autofdo_source_profile::read): Dump message > >> while merging profile. > >> * pass_manager.h (get_pass_auto_profile): New. > >> > >> gcc/testsuite/ChangeLog: > >> > >> * gcc.dg/tree-prof/clone-merge-1.c: New test. > >> > >> Is this OK? > > OK, thanks! > > It would be nice to also test that some afdo based optimization happens > > (i.e. have indirect call in the cloned function or so), so we check that > > the merged data actually works, but having a dump and check that > > something is done is already good :) > > > > BTW I now see main remaining spec2k17 regresions in omnetpp and > > imagemagick (small regressions are also in deepsjeng, leela, namd). > > I plan to investigate them, or do you happen to know what is goind on > > there? > > > > I still test with LTO and post-AFDO cloning disabled. Newly I noticed > > that the create_gcov tool seems to mishandle partitioned functions, so I > > also disable function partitioning (which is not on by default on ARM, > > but I think it would be useful to get it working). > > Thanks! Do you log your results publicly as you do with other spec2017 runs? > > I tried running spec2017 to get some latest numbers but hittting assertion > with `profile_count::compatible_p`. Should we make GUESSED_LOCAL count too to > AFDO type > or drop it to zero. > > Here is an example ICE: > > Cactus/piraha/Bracket.cc:133:1: internal compiler error: in > combine_with_ipa_count_within, at profile-count.cc:414 > 133 | } > | ^ > 0x25f995b internal_error(char const*, ...) > ../../gcc/gcc/diagnostic-global-context.cc:517 > 0x8589eb fancy_abort(char const*, int, char const*) > ../../gcc/gcc/diagnostic.cc:1803 > 0x1269533 profile_count::combine_with_ipa_count_within(profile_count, > profile_count) > ../../gcc/gcc/profile-count.cc:414 > 0xdbd83b cgraph_node::create_clone(tree_node*, profile_count, bool, > vec<cgraph_edge*, va_heap, vl_ptr>, bool, cgraph_node*, > ipa_param_adjustments*, char const*) > ../../gcc/gcc/cgraphclones.cc:410 > 0x108603f clone_inlined_nodes(cgraph_edge*, bool, bool, int*) > ../../gcc/gcc/ipa-inline-transform.cc:224 > 0x10860a3 clone_inlined_nodes(cgraph_edge*, bool, bool, int*) > ../../gcc/gcc/ipa-inline-transform.cc:248 > 0x1086947 inline_call(cgraph_edge*, bool, vec<cgraph_edge*, va_heap, > vl_ptr>*, int*, bool, bool*) > ../../gcc/gcc/ipa-inline-transform.cc:516 > 0x23fe81b inline_small_functions > ../../gcc/gcc/ipa-inline.cc:2434 > 0x23fe81b ipa_inline > ../../gcc/gcc/ipa-inline.cc:2916 > 0x23fe81b execute > ../../gcc/gcc/ipa-inline.cc:3314 <http://ipa-inline.cc:3314/> > > --- a/gcc/auto-profile.cc > +++ b/gcc/auto-profile.cc > @@ -1119,7 +1119,8 @@ update_count_by_afdo_count (profile_count *count, > gcov_type c) > *count = profile_count::from_gcov_type (c).afdo (); > /* In case we have guessed profile which is already zero, preserve > quality info. */ > - else if (count->nonzero_p () > + else if (!count->nonzero_p () > + || count->quality () == GUESSED_LOCAL > || count->quality () == GUESSED) > *count = profile_count::zero ().afdo (); > } Sorry for the breakage. I just noticed that independently too (I used release checking compiler for CPU build and did not see this before) and pushed the following fix.
The patch also improves handling of edge probabilities to make edges staticaly predicted as 0 (such as EH edges or longjmp ones) to be predicted reliably if auto-fdo confirms that and similarly for edges with 100% probability. This is a common case of single non-0 edge out of BB. I hope Petr will set up regular LNT CPU2017 tester (on zen5 CPU) today or tomorrow. I re-benchmarked CPU2017 with -march=native -flto -Ofast. For train run I now disable lto, ipa-icf and function partitioning, since they confuse profile noticeably. Still my results are worse then without FDO: 500.perlbench_r 1 158 10.1 * 1 151 10.5 * 502.gcc_r NR NR 505.mcf_r 1 185 8.74 * 1 193 8.38 * 520.omnetpp_r 1 184 7.14 * 1 193 6.78 * 523.xalancbmk_r NR NR 525.x264_r 1 85.3 20.5 * 1 85.8 20.4 * 531.deepsjeng_r 1 163 7.01 * 1 178 6.42 * 541.leela_r 1 271 6.12 * 1 292 5.68 * 548.exchange2_r 1 86.0 30.5 * 1 108 24.3 * 557.xz_r 1 226 4.78 * 1 230 4.69 * Est. SPECrate2017_int_base 9.73 Est. SPECrate2017_int_peak 9.19 exchange is likely due to ipa-cp cloning heuristics, but I did not look into omnetpp or deepsjeng yet. exchange is quite special, but it would be nice to understand omnetpp and deepsjeng. gcc/ChangeLog: * auto-profile.cc (update_count_by_afdo_count): Fix handling of GUESSED_LOCAL. (afdo_calculate_branch_prob): Preserve static profile for probabilities 0 and 1. diff --git a/gcc/auto-profile.cc b/gcc/auto-profile.cc index 215dadf87c2..91b1e97f5bb 100644 --- a/gcc/auto-profile.cc +++ b/gcc/auto-profile.cc @@ -1120,7 +1120,8 @@ update_count_by_afdo_count (profile_count *count, gcov_type c) /* In case we have guessed profile which is already zero, preserve quality info. */ else if (count->nonzero_p () - || count->quality () == GUESSED) + || count->quality () == GUESSED + || count->quality () == GUESSED_LOCAL) *count = profile_count::zero ().afdo (); } @@ -1515,8 +1516,21 @@ afdo_calculate_branch_prob (bb_set *annotated_bb) if (num_unknown_succ == 0 && total_count.nonzero_p ()) { FOR_EACH_EDGE (e, ei, bb->succs) - e->probability - = AFDO_EINFO (e)->get_count ().probability_in (total_count); + { + /* If probability is 1, preserve reliable static prediction + (This is, for example the case of single fallthru edge + or single fallthru plus unlikely EH edge.) */ + if (AFDO_EINFO (e)->get_count () == total_count + && e->probability == profile_probability::always ()) + ; + else if (AFDO_EINFO (e)->get_count ().nonzero_p ()) + e->probability + = AFDO_EINFO (e)->get_count ().probability_in (total_count); + /* If probability is zero, preserve reliable static prediction. */ + else if (e->probability.nonzero_p () + || e->probability.quality () == GUESSED) + e->probability = profile_probability::never ().afdo (); + } } } FOR_ALL_BB_FN (bb, cfun)