Re: [PATCH] Handle more COMDAT profiling issues

Teresa Johnson Mon, 07 Jul 2014 11:32:37 -0700

Finally had some time to pick this up again. See responses and
questions below. But it looks like you are commenting on a stale
version of the patch. See the update I sent in March:


https://gcc.gnu.org/ml/gcc-patches/2014-03/msg01176.html


On Sun, May 25, 2014 at 10:43 PM, Jan Hubicka <[email protected]> wrote:
>> This patch attempts to address the lost profile issue for COMDATs in
>> more circumstances, exposed by function splitting.
>>
>> My earlier patch handled the case where the comdat had 0 counts since
>> the linker kept the copy in a different module. In that case we
>> prevent the guessed frequencies on 0-count functions from being
>> dropped by counts_to_freq, and then later mark any reached via
>> non-zero callgraph edges as guessed. Finally, when one such 0-count
>> comdat is inlined the call count is propagated to the callee blocks
>> using the guessed probabilities.
>>
>> However, in this case, there was a comdat that had a very small
>> non-zero count, that was being inlined to a much hotter callsite. This
>> could happen when there was a copy that was ipa-inlined
>> in the profile gen compile, so the copy in that module gets some
>> non-zero counts from the ipa inlined instance, but when the out of
>> line copy was eliminated by the linker (selected from a different
>> module). In this case the inliner was scaling the bb counts up quite a
>> lot when inlining. The problem is that you most likely can't trust
>> that the 0 count bbs in such a case are really not executed by the
>> callsite it is being into, since the counts are very small and
>> correspond to a different callsite. In some internal C++ applications
>> I am seeing that we execute out of the split cold portion of COMDATs
>> for this reason.
>
> With all the problems we have with COMDATs, it still seems to me that perhaps
> best approach would be to merge them in runtime, as we discussed some time ago
> and incrementally handle the problem how to get callstack sensitive profiles.

I recently committed some functionality to a google branch to do some
COMDAT profile merging at runtime, but that was to handle all-zero
count copies and operates on top of the LIPO runtime infrastructure.
We don't do any fixup in non-zero count COMDATs to avoid losing
context-sensitive profiles.

>>
>> This problem is more complicated to address than the 0-count instance,
>> because we need the cgraph to determine which functions to drop the
>> profile on, and at that point the estimated frequencies have already
>> been overwritten by counts_to_freqs. To handle this broader case, I
>> have changed the drop_profile routine to simply re-estimate the
>> probabilities/frequencies (and translate these into counts scaled from
>> the incoming call edge counts). This unfortunately necessitates
>> rebuilding the cgraph, to propagate the new synthesized counts and
>> avoid checking failures downstream. But it will only be rebuilt if we
>> dropped any profiles. With this solution, some of the older approach
>> can be removed (e.g. propagating counts using the guessed
>> probabilities during inlining).
>>
>> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu.
>> Also tested on
>> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens?
>>
>> Thanks,
>> Teresa
>
> 2014-02-12  Teresa Johnson  <[email protected]>
>
>         * graphite.c (graphite_finalize): Pass new parameter.
>         * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New.
>         * predict.c (tree_estimate_probability): New parameter.
>         (tree_estimate_probability_worker): Renamed from
>         tree_estimate_probability_driver.
>         (tree_reestimate_probability): New function.
>         (tree_estimate_probability_driver): Invoke
>         tree_estimate_probability_worker.
>         (freqs_to_counts): Move here from tree-inline.c.
>         (drop_profile): Re-estimated profiles when dropping counts.
>         (handle_missing_profiles): Drop for some non-zero functions as well,
>         get counts from bbs to support invocation before cgraph rebuild.
>         (counts_to_freqs): Remove code obviated by reestimation.
>         * predict.h (tree_estimate_probability): Update declaration.
>         * tree-inline.c (freqs_to_counts): Move to predict.c.
>         (copy_cfg_body): Remove code obviated by reestimation.
>         * tree-profile.c (tree_profiling): Invoke handle_missing_profiles
>         before cgraph rebuild.
>
> Index: params.def
> ===================================================================
> --- params.def  (revision 207436)
> +++ params.def  (working copy)
> @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
>           "Maximal estimated outcome of branch considered predictable",
>           2, 0, 50)
>
> +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO,
> +         "min-caller-reestimate-ratio",
> +         "Minimum caller-to-callee node count ratio to force reestimated 
> branch "
> +          "probabilities in callee (where 0 means only when callee count is 
> 0)",
> +         10, 0, 0)
>
> OK, so if callee count is 10% of call site count, we do the recomputation?

Right.


>
> @@ -2390,7 +2392,8 @@ void
>    connect_infinite_loops_to_exit ();
>    /* We use loop_niter_by_eval, which requires that the loops have
>       preheaders.  */
> -  create_preheaders (CP_SIMPLE_PREHEADERS);
> +  if (!redo)
> +    create_preheaders (CP_SIMPLE_PREHEADERS);
>
> So you disable preheaders construction because this all happens at inlining
> time when we do not want to change BBs because callgraph is built, right?
> Are you sure we do have preheaders built at IPA time for all functions 
> reliably?
> (It would make sense to do so, but I think loop analysis is done rather 
> randomly
> by IPA analysis passes, such as inliner).
> Otherwise the SCEV infrastructure will explode, if I remember correctly.

I can't remember why I guarded this, but it appears it was just
precautionary. I just changed the implementation to assert before
invoking create_preheaders if redo is true and the loops don't already
have preheaders, made the call to create_preheaders unconditional, and
added an assert afterwards if there was any change when redo is true:

  gcc_assert (!redo || loops_state_satisfies_p (LOOPS_HAVE_PREHEADERS));
  bool changed = create_preheaders (CP_SIMPLE_PREHEADERS);
  gcc_assert (!(redo && changed));

(changed create_preheaders to return a bool indicating whether any
preheaders were created).

Neither assert fired for either gcc regression tests, or more
extensive tests I ran on internal C++ benchmarks. Should I change the
patch to make the call unconditional and add these asserts?

>
> +/* Force re-estimation of the probabilities, because the profile was
> +   deemed insufficient.  */
> +
> +static void
> +tree_reestimate_probability (void)
> +{
> +  basic_block bb;
> +  edge e;
> +  edge_iterator ei;
> +
> +  /* Need to reset the counts to ensure probabilities are recomputed.  */
> +  FOR_EACH_BB_FN (bb, cfun)
> +    {
> +      bb->count = 0;
> +      FOR_EACH_EDGE (e, ei, bb->succs)
> +        e->count = 0;
> +    }
> +  tree_estimate_probability_worker (true);
>
> You also want to remove recorded loop count estimates, since they are also
> driven by profile we chose to not trust.

Can you point me to the recorded loop count estimates you are
referring to? I poked around and didn't find them. The places I looked
used the edge counts to estimate, and seemed downstream of this.

>
> @@ -2842,48 +2927,77 @@ handle_missing_profiles (void)
>        gcov_type max_tp_first_run = 0;
>        struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>
> -      if (node->count)
> -        continue;
>        for (e = node->callers; e; e = e->next_caller)
>        {
> -        call_count += e->count;
> +        call_count += cgraph_function_with_gimple_body_p (e->caller)
> +                      ? gimple_bb (e->call_stmt)->count : 0;
>
> I am quite confused by this change - we you don't want to trust edge counts?
> Other parts of the patch seems quite resonable, but I don't really follow 
> here.

It is because we are invoking this before rebuilding the cgraph. Here
is the comment to this effect that I added just above
handle_missing_profiles:

+   This is now invoked before rebuilding the cgraph after reading profile
+   counts, so the cgraph edge and node counts are still 0. Therefore we
+   need to look at the counts on the entry bbs and the call stmt bbs.
+   That way we can avoid needing to rebuild the cgraph again to reflect
+   the nodes with dropped profiles.  */

Thanks,
Teresa

>
> Honza



-- 
Teresa Johnson | Software Engineer | [email protected] | 408-460-2413

Re: [PATCH] Handle more COMDAT profiling issues

Reply via email to