Re: [PATCH] [RFC][AutoFDO] Implement summary information in auto-profile

Jan Hubicka Sat, 18 Oct 2025 16:25:20 -0700

Hello.
> +/* Store the summary information for the profile.  */
> +struct summary_info
> +{
> +  struct detailed_summary
> +  {
> +    uint32_t cutoff;
> +    uint64_t min_count;
> +    uint64_t num_counts;
> +  };
> +
> +  uint64_t total_count;
> +  uint64_t max_count;
> +  uint64_t max_function_count;
> +  uint64_t num_counts;
> +  uint64_t num_functions;
> +  auto_vec<detailed_summary> detailed_summaries;


The summary definitely definitely needs a description of all the fields.
If I get it right, the vector of detailed summary is a histogram of
counts present in the profile, while the rest of structure contains
simple statistics of them. 

In the present form, I am not quite sure what is buys in practice, since
the file format is organized in a way that GCC needs to read all the
counters each invocation and thus it can also compute the statistics
during stream-in.

There is important streaming overhead. AFDO profile of clang compiling
tramp3d (so very small train run) is 82MB when using create_gcov based
on GCC built binary and 168MB when using llvm's equivalent (building
clang by clang and using create_llvm_prof)

Now I get 
 - 0.003795 second per invocation of gcc -S t.c
 - 0.946906 second per invocation of gcc -S t.c -fauto-profile=llvm.gcov
 - 0.011225 second per invocation of clang -S t.c
 - 0.012016 second per invocation of clang -S t.c 
-fprofile-instr-use=llvm.profdata 

1s per compiler invocation is really bad and slows down real world build
quite dramatically. While datastructures are bad and can be fixed, clang
is clearly not reading whole file each invocation.  

So we will need to add a section linking symbol names to offsets of
their instances and allow random access to the file.  With this
streaming summary makes sense.

> +  std::map<uint32_t, unsigned> percentile_idx_map;
This seems to be there only to cache get_threshold_count results which
is called twice

> +uint64_t
> +min_afdo_hot_count (uint32_t cutoff)
> +{
> +  return autofdo::afdo_summary_info->get_threshold_count (cutoff);
> +}
> +
> +/* Get the maximum count needed to be considered cold for the percentile.  */
> +
> +uint64_t
> +max_afdo_cold_count (uint32_t cutoff)
> +{
> +  return autofdo::afdo_summary_info->get_threshold_count (cutoff);
> +}
in these two functions where only first one is invoked.

With LTO GCC has analogous functionality in ipa-profile.cc:ipa_profile
which computes histogram of all basic block counts in the program
weighted by its expected execution counts and computes hot BB cutoff
based on param_hot_bb_count_ws_permille.
We consider hot counts, cold counts and probably never executed counts
(which is 0 count from porfile feedback or static estimation).

So I would drop --param afdo-profile-summary-cutoff-hotcold_count and
use --param hot-bb-count-ws-permille for afdo as well.

> +
>  namespace
>  {
>  
> diff --git a/gcc/auto-profile.h b/gcc/auto-profile.h
> index 3cce5f2152c..66ef20419e8 100644
> --- a/gcc/auto-profile.h
> +++ b/gcc/auto-profile.h
> @@ -42,4 +42,10 @@ extern gcov_type afdo_hot_bb_threshold;
>  /* Return true if COUNT is possibly hot.  */
>  extern bool maybe_hot_afdo_count_p (profile_count count);
>  
> +/* Get the minimum count needed to achieve the hot threshold percentile.  */
> +extern uint64_t min_afdo_hot_count (uint32_t cutoff);
> +
> +/* Get the maximum count needed to be considered cold for the percentile.  */
> +extern uint64_t max_afdo_cold_count (uint32_t cutoff);

I do not think we need these two functions. Auto-profile already sets
hot count threshold so it can use --param hot-bb-count-ws-permille 
instead of current heuristics.
> +
>  #endif /* AUTO_PROFILE_H */
> diff --git a/gcc/c/Make-lang.in b/gcc/c/Make-lang.in
> index f09fc99467b..95a7dad8350 100644
> --- a/gcc/c/Make-lang.in
> +++ b/gcc/c/Make-lang.in
> @@ -102,7 +102,7 @@ create_fdas_for_cc1: ../stage1-gcc/cc1$(exeext) 
> ../prev-gcc/$(PERF_DATA)
>         echo $$perf_path; \
>         if [ -f $$perf_path ]; then \
>           profile_name=cc1_$$component_in_prev.fda; \
> -         $(CREATE_GCOV) -binary ../stage1-gcc/cc1$(exeext) -gcov 
> $$profile_name -profile $$perf_path -gcov_version 2 || exit 1; \
> +         $(CREATE_GCOV) -binary ../stage1-gcc/cc1$(exeext) -gcov 
> $$profile_name -profile $$perf_path -gcov_version 3 || exit 1; \

While we want to bump up gcov version to represent 32bit discriminators,
the gcov file format is intended to be extensible. Since you are just
adding new section I think everything should just work if create_gcov is
updated to add it to version 2.  (i.e. older GCC versions should ignore
it). Does that work?  In the past we did not really need it to work
since gcov format is closely tied with the GCC version (and choice of
early opts) but now I think it is quite useful to get this working since
autofdo profiles are indended to be stable wrt GCC or source code
changes.

Rest of patch looks OK, but can you please drop the params and version
bump and verify that things works as expected?

Thanks,
Honza

Re: [PATCH] [RFC][AutoFDO] Implement summary information in auto-profile

Reply via email to