On 17/06/25 18:35, Jan Hubicka wrote:
External email: Use caution opening links or attachments


From: Dhruv Chawla <dhr...@nvidia.com>

This patch modifies afdo_set_bb_count to propagate profile information
to outline copies of functions if they are not inlined. This information
gets lost otherwise.

Signed-off-by: Dhruv Chawla <dhr...@nvidia.com>

gcc/ChangeLog:

       * gcc/auto-profile.cc (count_info): Adjust comments.
       (function_instance::in_afdo_source_profile): New function.
       (function_instance::set_in_afdo_source_profile): Likewise.
       (function_instance::get_callsites): Likewise.
       (autofdo_source_profile::add_function_instance): Likewise.
       (function_instance::in_afdo_source_profile_): New member.
       (autofdo_source_profile::~autofdo_source_profile): Use
       in_afdo_source_profile to prevent a double-free bug.
       (afdo_set_bb_count): Propagate inline information to non-inlined
       outline copy.

@@ -230,6 +224,10 @@ public:
      return head_count_;
    }

+  bool in_afdo_source_profile () const { return in_afdo_source_profile_; }
+
+  void set_in_afdo_source_profile () { in_afdo_source_profile_ = true; }

    /* Map from source location to count_info.  */
    position_count_map pos_counts;
+
+  /* If this is an inline instance tracked in afdo_source_profile.  */
+  bool in_afdo_source_profile_;

What about calling it is_inline_instance?  I think it is more
self-explanatory of what it is useful for.

Thanks, that is a much better name than what I came up with :)


@@ -814,6 +825,24 @@ autofdo_source_profile::update_inlined_ind_target (gcall 
*stmt,
    return true;
  }

+/* Add a new function_instance entry if an inlined function is found in the
+   profile that doesn't have a corresponding entry in the map. Return false if
+   the function_instance already exists, true if it doesn't.  */
+
+bool
+autofdo_source_profile::add_function_instance (function_instance *fun)
+{
+  int name = fun->name ();
+  if (auto fun_it = map_.find (name); fun_it != map_.end ())
+    {
+      fun_it->second->merge (fun);

Assume that we have call chain
   foo->bar->baz
and assume that this was inlined in the train run, while after early
inlining we inline bar->baz but not foo->bar.

Will we then also recursively merge foo->bar->baz profile into bar->baz?

Currently only foo->bar profile will be merged. bar->baz is not considered. I 
had
thought about doing this before, but this would require traversing bar to make 
sure
that baz has / has not been inlined when visiting foo, repeated recursively for 
however
deep the inlined chain is. I wasn't sure how to do this in a less expensive 
way. I don't
think its possible to just check if a cgraph_edge exists from bar->baz because 
there
may be multiple bar->baz edges, so the exact source location needs to be 
checked.

It may be possible to collect these nested profiles in a pre-pass, but I will 
need to
experiment more with it.

@@ -1144,6 +1173,31 @@ afdo_set_bb_count (basic_block bb, const stmt_set 
&promoted)
        gimple *stmt = gsi_stmt (gsi);
        if (gimple_clobber_p (stmt) || is_gimple_debug (stmt))
          continue;
+      if (gimple_code (stmt) == GIMPLE_CALL)
+     {
+       tree fn = gimple_call_fndecl (stmt);
+       function_instance *cur
+         = afdo_source_profile->get_function_instance_by_decl (
+           current_function_decl);
+       if (!cur || !fn)
+         continue;
I think we want to do the merging in a global pass before annotation
starts (to solve the dependency).
I think you can relatively cheaply walk all functions in callgraph, all
edges and for each edge see if it the statement is inlined in profile
and do the merging...

Yes, this sounds good. I will try implementing this.

Replying to a couple of your other comments here as well:


Yes, I think we first load auto-fdo data (at invocation of the copmiler)
and then during normal early optimize we early inline direct calls and
already use afdo data.  This is for inlining indirect calls (and it is
why the pass does VPT) which is less common scenario.  I agree that we
it would make sense to split this into full IPA passes. Ie. in

pass 1:

  FOR_EACH_FUNCTION
    vpr + early inlining of indirect calls + possibly re-do early
    optimizations when happened.

pass 2:

  Lookup all inlined instances in AFDO profile and if inlining did not
  happen, merge to corresponding offline copies (as you do) and possibly
  release them.

pass 3:
  FOR_EACH_FUNCTION
    autofdo::afdo_annotate_cfg (promoted_stmts);
    compute_function_frequency ();

Sounds good, I will try and implement this in the next version of the patch
as above.


In longer term, it would be nice to be able to load FDO and auto-FDO at
LTO link-time. That would make usage easier (and faster), since one can
only re-link with LTO instead of rebuilding whole tree.  It would also
give us a chance to solve problems we have with cross-module inlining.
Those exists both in -fprofile-use and auto-profile paths, since with
-fprofile-use we may mix up comdats.

To implement that we could stream CFG separately from rest of function
body and be able to load it at WPA time.  It would probaly need
restructuring CFG representaiton to a more lean base that is used for
this and on which we can do some basic algorithms, like dominance
computation.

>
> Because the early inliner is invoked while annotation is being done,
> its possible that all known information has not been propagated to the
> total_count of the function_instance when the early inliner is invoked.
>
> Another problem here is that get_inline_stack returns an empty stack if
> no inlining occurred in the corresponding GIMPLE statement. So if an
Hmm, so we have a bug here?
afdo_callsite_hot_enough_for_early_inline should return true if called
on non-inlined call edge that is inlined in train run and has enough
sampes in it to be considered hot.

Yeah, seems like a bug to me as well.


Honza

--
Regards,
Dhruv

Reply via email to