On Wed, Feb 12, 2014 at 8:02 AM, Teresa Johnson <[email protected]> wrote:
> On Wed, Feb 12, 2014 at 6:45 AM, Teresa Johnson <[email protected]> wrote:
>> On Tue, Feb 11, 2014 at 6:13 PM, Xinliang David Li <[email protected]>
>> wrote:
>>> On Tue, Feb 11, 2014 at 5:36 PM, Teresa Johnson <[email protected]>
>>> wrote:
>>>> On Tue, Feb 11, 2014 at 5:16 PM, Xinliang David Li <[email protected]>
>>>> wrote:
>>>>> Why is call graph needed to determine whether to drop the profile?
>>>>
>>>> Because we detect this situation by looking for cases where the call
>>>> edge counts greatly exceed the callee node count.
>>>>
>>>>>
>>>>> If that is needed, it might be possible to leverage the ipa_profile
>>>>> pass as it will walk through all function nodes to do profile
>>>>> annotation. With this you can make decision to drop callee profile in
>>>>> caller's context.
>>>>
>>>> There are 2 ipa profiling passes, which are somewhat confusingly named
>>>> (to me at least. =). This is being done during the first.
>>>>
>>>> The first is pass_ipa_tree_profile in tree-profile.c, but is a
>>>> SIMPLE_IPA_PASS and has the name "profile" in the dump. The second is
>>>> pass_ipa_profile in ipa-profile.c, which is an IPA_PASS and has the
>>>> name "profile_estimate" in the dump. I assume you are suggesting to
>>>> move this into the latter? But I'm not clear on what benefit that
>>>> gives - the functions are not being traversed in order, so there is
>>>> still the issue of needing to rebuild the cgraph after dropping
>>>> profiles, which might be best done earlier as I have in the patch.
>>>
>>>
>>> I meant the tree-profile one. I think this might work: after all the
>>> function's profile counts are annotated, add another walk of the the
>>> call graph nodes to drop bad profiles before the the call graph is
>>> rebuilt (Call graph does exist at that point).
>>
>> Ok, so it is already done in tree-profile. But it sounds like you are
>> suggesting reordering it to just above where we update the calls and
>> rebuild the cgraph the first time? As you noted in a follow-on email
>> to me, the cgraph edges don't have the profile counts at that point
>> (and neither do the nodes), so I would need to compare the count on
>> the call's bb to the entry bb count of the callee. That should be
>> doable, let me take a stab at it.
>
> This works well. Tested on omnetpp which has some dropped profiles and
> ensured that the behavior and output of the ipa tree profile phase is
> the same. Re-running bootstrap and regression tests.
>
> Here's the new patch. The only changes from the earlier patch are in
> handle_missing_profiles, where we now get the counts off of the entry
> and call stmt bbs, and in tree_profiling, where we call
> handle_missing_profiles earlier and I have removed the outlined cgraph
> rebuilding code since it doesn't need to be reinvoked.
>
> Honza, does this look ok for trunk when stage 1 reopens? David, I can
> send a similar patch for review to google-4_8 if it looks good.
>
> Thanks,
> Teresa
>
> 2014-02-12 Teresa Johnson <[email protected]>
>
> * graphite.c (graphite_finalize): Pass new parameter.
> * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New.
> * predict.c (tree_estimate_probability): New parameter.
> (tree_estimate_probability_worker): Renamed from
> tree_estimate_probability_driver.
> (tree_reestimate_probability): New function.
> (tree_estimate_probability_driver): Invoke
> tree_estimate_probability_worker.
> (freqs_to_counts): Move here from tree-inline.c.
> (drop_profile): Re-estimated profiles when dropping counts.
> (handle_missing_profiles): Drop for some non-zero functions as well,
> get counts from bbs to support invocation before cgraph rebuild.
> (counts_to_freqs): Remove code obviated by reestimation.
> * predict.h (tree_estimate_probability): Update declaration.
> * tree-inline.c (freqs_to_counts): Move to predict.c.
> (copy_cfg_body): Remove code obviated by reestimation.
> * tree-profile.c (tree_profiling): Invoke handle_missing_profiles
> before cgraph rebuild.
>
> Index: graphite.c
> ===================================================================
> --- graphite.c (revision 207436)
> +++ graphite.c (working copy)
> @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p)
> cleanup_tree_cfg ();
> profile_status_for_fn (cfun) = PROFILE_ABSENT;
> release_recorded_exits ();
> - tree_estimate_probability ();
> + tree_estimate_probability (false);
> }
>
> cloog_state_free (cloog_state);
> Index: params.def
> ===================================================================
> --- params.def (revision 207436)
> +++ params.def (working copy)
> @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
> "Maximal estimated outcome of branch considered predictable",
> 2, 0, 50)
>
> +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO,
> + "min-caller-reestimate-ratio",
> + "Minimum caller-to-callee node count ratio to force
> reestimated branch "
> + "probabilities in callee (where 0 means only when callee
> count is 0)",
> + 10, 0, 0)
> +
> DEFPARAM (PARAM_INLINE_MIN_SPEEDUP,
> "inline-min-speedup",
> "The minimal estimated speedup allowing inliner to ignore
> inline-insns-single and inline-isnsns-auto",
> Index: predict.c
> ===================================================================
> --- predict.c (revision 207436)
> +++ predict.c (working copy)
> @@ -2379,10 +2379,12 @@ tree_estimate_probability_bb (basic_block bb)
>
> /* Predict branch probabilities and estimate profile of the tree CFG.
> This function can be called from the loop optimizers to recompute
> - the profile information. */
> + the profile information. When REDO is true then we are forcing
> + re-estimation of the probabilities because the profile was deemed
> + insufficient. */
>
> void
> -tree_estimate_probability (void)
> +tree_estimate_probability (bool redo)
> {
> basic_block bb;
>
> @@ -2390,7 +2392,8 @@ void
> connect_infinite_loops_to_exit ();
> /* We use loop_niter_by_eval, which requires that the loops have
> preheaders. */
> - create_preheaders (CP_SIMPLE_PREHEADERS);
> + if (!redo)
> + create_preheaders (CP_SIMPLE_PREHEADERS);
> calculate_dominance_info (CDI_POST_DOMINATORS);
>
> bb_predictions = pointer_map_create ();
> @@ -2412,16 +2415,16 @@ void
> pointer_map_destroy (bb_predictions);
> bb_predictions = NULL;
>
> - estimate_bb_frequencies (false);
> + estimate_bb_frequencies (redo);
> free_dominance_info (CDI_POST_DOMINATORS);
> remove_fake_exit_edges ();
> }
>
> /* Predict branch probabilities and estimate profile of the tree CFG.
> - This is the driver function for PASS_PROFILE. */
> + When REDO is true, we are forcing reestimation of the probabilities. */
>
> -static unsigned int
> -tree_estimate_probability_driver (void)
> +static void
> +tree_estimate_probability_worker (bool redo)
> {
> unsigned nb_loops;
>
> @@ -2435,7 +2438,7 @@ void
> if (nb_loops > 1)
> scev_initialize ();
>
> - tree_estimate_probability ();
> + tree_estimate_probability (redo);
>
> if (nb_loops > 1)
> scev_finalize ();
> @@ -2445,6 +2448,34 @@ void
> gimple_dump_cfg (dump_file, dump_flags);
> if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
> profile_status_for_fn (cfun) = PROFILE_GUESSED;
> +}
> +
> +/* Force re-estimation of the probabilities, because the profile was
> + deemed insufficient. */
> +
> +static void
> +tree_reestimate_probability (void)
> +{
> + basic_block bb;
> + edge e;
> + edge_iterator ei;
> +
> + /* Need to reset the counts to ensure probabilities are recomputed. */
> + FOR_EACH_BB_FN (bb, cfun)
> + {
> + bb->count = 0;
> + FOR_EACH_EDGE (e, ei, bb->succs)
> + e->count = 0;
> + }
> + tree_estimate_probability_worker (true);
> +}
> +
> +/* Estimate probabilities.
> + This is the driver function for PASS_PROFILE. */
> +static unsigned int
> +tree_estimate_probability_driver (void)
> +{
> + tree_estimate_probability_worker (false);
> return 0;
> }
> ^L
> @@ -2765,6 +2796,28 @@ estimate_loops (void)
> BITMAP_FREE (tovisit);
> }
>
> +/* Convert estimated frequencies into counts for NODE, scaling COUNT
> + with each bb's frequency. Used when NODE has an entry count that
> + is much lower than the caller edges reaching it. See the comments
> + for handle_missing_profiles() for when this can happen for COMDATs. */
> +
> +void
> +freqs_to_counts (struct cgraph_node *node, gcov_type count)
> +{
> + basic_block bb;
> + edge_iterator ei;
> + edge e;
> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> + FOR_ALL_BB_FN(bb, fn)
> + {
> + bb->count = apply_scale (count,
> + GCOV_COMPUTE_SCALE (bb->frequency,
> BB_FREQ_MAX));
> + FOR_EACH_EDGE (e, ei, bb->succs)
> + e->count = apply_probability (e->src->count, e->probability);
> + }
> +}
> +
> /* Drop the profile for NODE to guessed, and update its frequency based on
> whether it is expected to be hot given the CALL_COUNT. */
>
> @@ -2772,6 +2825,9 @@ static void
> drop_profile (struct cgraph_node *node, gcov_type call_count)
> {
> struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> +
> + if (profile_status_for_fn (fn) == PROFILE_GUESSED)
> + return;
> /* In the case where this was called by another function with a
> dropped profile, call_count will be 0. Since there are no
> non-zero call counts to this function, we don't know for sure
> @@ -2780,7 +2836,8 @@ drop_profile (struct cgraph_node *node, gcov_type
>
> if (dump_file)
> fprintf (dump_file,
> - "Dropping 0 profile for %s/%i. %s based on calls.\n",
> + "Dropping %ld profile for %s/%i. %s based on calls.\n",
> + node->count,
> node->name (), node->order,
> hot ? "Function is hot" : "Function is normal");
> /* We only expect to miss profiles for functions that are reached
> @@ -2806,6 +2863,18 @@ drop_profile (struct cgraph_node *node, gcov_type
> node->name (), node->order);
> }
>
> + /* Re-estimate the probabilities for function and use the estimated
> + frequencies to compute the counts. */
> + push_cfun (DECL_STRUCT_FUNCTION (node->decl));
> + tree_reestimate_probability ();
> + freqs_to_counts (node, call_count);
> + if (dump_file)
> + {
> + fprintf (dump_file, "After re-estimating probabilies and counts\n");
> + gimple_dump_cfg (dump_file,
> dump_flags|TDF_DETAILS|TDF_BLOCKS|TDF_LINENO|TDF_STATS);
> + }
> + pop_cfun ();
> +
> profile_status_for_fn (fn)
> = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
> node->frequency
> @@ -2815,15 +2884,29 @@ drop_profile (struct cgraph_node *node, gcov_type
> /* In the case of COMDAT routines, multiple object files will contain the
> same
> function and the linker will select one for the binary. In that case
> all the other copies from the profile instrument binary will be missing
> - profile counts. Look for cases where this happened, due to non-zero
> + profile counts. This can confuse downstream optimizations such as
> + function splitting.
> +
> + Look for cases where this happened, due to non-zero
> call counts going to 0-count functions, and drop the profile to guessed
> so that we can use the estimated probabilities and avoid optimizing only
> - for size.
> + for size. In the case where the COMDAT was inlined in some locations
> + within the file and not others, the profile count will be non-zero due
> + to the inlined instances, but may still be significantly smaller than the
> + call edges for the non-inlined instances. Detect that case when requested
> + and reestimate probabilities, since the counts will not necessarily
> reflect
> + the behavior along the more frequent call paths.
>
> The other case where the profile may be missing is when the routine
> is not going to be emitted to the object file, e.g. for "extern template"
> class methods. Those will be marked DECL_EXTERNAL. Emit a warning in
> - all other cases of non-zero calls to 0-count functions. */
> + all other cases of non-zero calls to 0-count functions.
> +
> + This is now invoked before rebuilding the cgraph after reading profile
> + counts, so the cgraph edge and node counts are still 0. Therefore we
> + need to look at the counts on the entry bbs and the call stmt bbs.
> + That way we can avoid needing to rebuild the cgraph again to reflect
> + the nodes with dropped profiles. */
>
> void
> handle_missing_profiles (void)
> @@ -2832,9 +2915,11 @@ handle_missing_profiles (void)
> int unlikely_count_fraction = PARAM_VALUE (UNLIKELY_BB_COUNT_FRACTION);
> vec<struct cgraph_node *> worklist;
> worklist.create (64);
> + int min_reest_ratio = PARAM_VALUE (PARAM_MIN_CALLER_REESTIMATE_RATIO);
>
> - /* See if 0 count function has non-0 count callers. In this case we
> - lost some profile. Drop its function profile to PROFILE_GUESSED. */
> + /* See if 0 or low count function has higher count caller edges. In this
> + case we lost some profile. Drop its function profile to
> + PROFILE_GUESSED. */
> FOR_EACH_DEFINED_FUNCTION (node)
> {
> struct cgraph_edge *e;
> @@ -2842,48 +2927,75 @@ handle_missing_profiles (void)
> gcov_type max_tp_first_run = 0;
> struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>
> - if (node->count)
> - continue;
> for (e = node->callers; e; e = e->next_caller)
> {
> - call_count += e->count;
> + call_count += gimple_bb (e->call_stmt)->count;
>
> if (e->caller->tp_first_run > max_tp_first_run)
> max_tp_first_run = e->caller->tp_first_run;
> }
Should non comdat function be skipped?
>
> + if (!fn || !fn->cfg)
> + continue;
> +
> + gcov_type node_count = ENTRY_BLOCK_PTR_FOR_FN (fn)->count;
> +
> + /* When the PARAM_MIN_CALLER_REESTIMATE_RATIO is 0, then we only drop
> + profiles for 0-count functions called by non-zero call edges. */
> + if ((!min_reest_ratio && node_count > 0)
> + || (min_reest_ratio && node_count * min_reest_ratio > call_count))
> + continue;
> +
> /* If time profile is missing, let assign the maximum that comes from
> caller functions. */
> if (!node->tp_first_run && max_tp_first_run)
> node->tp_first_run = max_tp_first_run + 1;
>
> if (call_count
> - && fn && fn->cfg
> && (call_count * unlikely_count_fraction >= profile_info->runs))
> {
> drop_profile (node, call_count);
> worklist.safe_push (node);
> }
> }
> -
> - /* Propagate the profile dropping to other 0-count COMDATs that are
> + /* Propagate the profile dropping to other low-count COMDATs that are
> potentially called by COMDATs we already dropped the profile on. */
> while (worklist.length () > 0)
> {
> struct cgraph_edge *e;
>
> node = worklist.pop ();
> + struct function *node_fn = DECL_STRUCT_FUNCTION (node->decl);
> + gcc_assert (node_fn && node_fn->cfg);
> + gcov_type node_count = ENTRY_BLOCK_PTR_FOR_FN (node_fn)->count;
> for (e = node->callees; e; e = e->next_caller)
> {
> struct cgraph_node *callee = e->callee;
> struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
> + if (!fn || !fn->cfg)
> + continue;
> + gcov_type callee_count = ENTRY_BLOCK_PTR_FOR_FN (fn)->count;
>
> - if (callee->count > 0)
> + /* When min_reest_ratio is non-zero, if we get here we dropped
> + a caller's profile since it was significantly smaller than its
> + call edge. Drop the profile on any callees whose node count is
> + now exceeded by the new caller node count. */
> + if ((!min_reest_ratio && callee_count > 0)
> + || (min_reest_ratio && callee_count >= node_count))
> continue;
> - if (DECL_COMDAT (callee->decl) && fn && fn->cfg
> +
> + gcov_type call_count = 0;
> + if (min_reest_ratio > 0)
> + {
> + struct cgraph_edge *e2;
> + for (e2 = node->callers; e2; e2 = e2->next_caller)
> + call_count += gimple_bb (e2->call_stmt)->count;
> + }
> +
> + if (DECL_COMDAT (callee->decl)
> && profile_status_for_fn (fn) == PROFILE_READ)
> {
> - drop_profile (node, 0);
> + drop_profile (node, call_count);
> worklist.safe_push (callee);
> }
Should the comdat check be done earlier?
David
> }
> @@ -2900,12 +3012,6 @@ counts_to_freqs (void)
> gcov_type count_max, true_count_max = 0;
> basic_block bb;
>
> - /* Don't overwrite the estimated frequencies when the profile for
> - the function is missing. We may drop this function PROFILE_GUESSED
> - later in drop_profile (). */
> - if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count)
> - return 0;
> -
> FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
> true_count_max = MAX (bb->count, true_count_max);
>
> Index: predict.h
> ===================================================================
> --- predict.h (revision 207436)
> +++ predict.h (working copy)
> @@ -51,7 +51,7 @@ extern void handle_missing_profiles (void);
> extern void estimate_bb_frequencies (bool);
> extern const char *predictor_name (enum br_predictor);
> extern tree build_predict_expr (enum br_predictor, enum prediction);
> -extern void tree_estimate_probability (void);
> +extern void tree_estimate_probability (bool);
> extern void compute_function_frequency (void);
> extern void rebuild_frequencies (void);
>
> Index: tree-inline.c
> ===================================================================
> --- tree-inline.c (revision 207436)
> +++ tree-inline.c (working copy)
> @@ -2384,29 +2384,6 @@ redirect_all_calls (copy_body_data * id, basic_blo
> }
> }
>
> -/* Convert estimated frequencies into counts for NODE, scaling COUNT
> - with each bb's frequency. Used when NODE has a 0-weight entry
> - but we are about to inline it into a non-zero count call bb.
> - See the comments for handle_missing_profiles() in predict.c for
> - when this can happen for COMDATs. */
> -
> -void
> -freqs_to_counts (struct cgraph_node *node, gcov_type count)
> -{
> - basic_block bb;
> - edge_iterator ei;
> - edge e;
> - struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
> -
> - FOR_ALL_BB_FN(bb, fn)
> - {
> - bb->count = apply_scale (count,
> - GCOV_COMPUTE_SCALE (bb->frequency,
> BB_FREQ_MAX));
> - FOR_EACH_EDGE (e, ei, bb->succs)
> - e->count = apply_probability (e->src->count, e->probability);
> - }
> -}
> -
> /* Make a copy of the body of FN so that it can be inserted inline in
> another function. Walks FN via CFG, returns new fndecl. */
>
> @@ -2427,24 +2404,6 @@ copy_cfg_body (copy_body_data * id, gcov_type coun
> int incoming_frequency = 0;
> gcov_type incoming_count = 0;
>
> - /* This can happen for COMDAT routines that end up with 0 counts
> - despite being called (see the comments for handle_missing_profiles()
> - in predict.c as to why). Apply counts to the blocks in the callee
> - before inlining, using the guessed edge frequencies, so that we don't
> - end up with a 0-count inline body which can confuse downstream
> - optimizations such as function splitting. */
> - if (!ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count && count)
> - {
> - /* Apply the larger of the call bb count and the total incoming
> - call edge count to the callee. */
> - gcov_type in_count = 0;
> - struct cgraph_edge *in_edge;
> - for (in_edge = id->src_node->callers; in_edge;
> - in_edge = in_edge->next_caller)
> - in_count += in_edge->count;
> - freqs_to_counts (id->src_node, count > in_count ? count : in_count);
> - }
> -
> if (ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count)
> count_scale
> = GCOV_COMPUTE_SCALE (count,
> @@ -2452,6 +2411,13 @@ copy_cfg_body (copy_body_data * id, gcov_type coun
> else
> count_scale = REG_BR_PROB_BASE;
>
> + if (dump_file && (dump_flags & TDF_DETAILS))
> + fprintf (dump_file,
> + "Scaling entry count %ld to %ld with scale %ld while inlining "
> + "%s into %s\n",
> + count, ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count, count_scale,
> + id->src_node->name (), id->dst_node->name ());
> +
> /* Register specific tree functions. */
> gimple_register_cfg_hooks ();
>
> Index: tree-profile.c
> ===================================================================
> --- tree-profile.c (revision 207436)
> +++ tree-profile.c (working copy)
> @@ -621,6 +621,8 @@ tree_profiling (void)
> cgraph_set_pure_flag (node, false, false);
> }
>
> + handle_missing_profiles ();
> +
> /* Update call statements and rebuild the cgraph. */
> FOR_EACH_DEFINED_FUNCTION (node)
> {
> @@ -657,8 +659,6 @@ tree_profiling (void)
> pop_cfun ();
> }
>
> - handle_missing_profiles ();
> -
> del_node_map ();
> return 0;
> }
>
>>
>> Thanks,
>> Teresa
>>
>>>
>>> David
>>>>
>>>> Teresa
>>>>
>>>>>
>>>>> David
>>>>>
>>>>> On Tue, Feb 11, 2014 at 5:04 PM, Teresa Johnson <[email protected]>
>>>>> wrote:
>>>>>> On Tue, Feb 11, 2014 at 2:56 PM, Xinliang David Li <[email protected]>
>>>>>> wrote:
>>>>>>> Is it better to add some logic in counts_to_freq to determine if the
>>>>>>> profile count needs to be dropped completely to force profile
>>>>>>> estimation?
>>>>>>
>>>>>> This is the problem I was mentioning below where we call
>>>>>> counts_to_freqs before we have the cgraph and can tell that we will
>>>>>> need to drop the profile. When we were only dropping the profile for
>>>>>> functions with all 0 counts, we simply avoided doing the
>>>>>> counts_to_freqs when the counts were all 0, which works since the 0
>>>>>> counts don't leave things in an inconsistent state (counts vs
>>>>>> estimated frequencies).
>>>>>>
>>>>>> Teresa
>>>>>>
>>>>>>>
>>>>>>> David
>>>>>>>
>>>>>>> On Mon, Feb 10, 2014 at 2:12 PM, Teresa Johnson <[email protected]>
>>>>>>> wrote:
>>>>>>>> This patch attempts to address the lost profile issue for COMDATs in
>>>>>>>> more circumstances, exposed by function splitting.
>>>>>>>>
>>>>>>>> My earlier patch handled the case where the comdat had 0 counts since
>>>>>>>> the linker kept the copy in a different module. In that case we
>>>>>>>> prevent the guessed frequencies on 0-count functions from being
>>>>>>>> dropped by counts_to_freq, and then later mark any reached via
>>>>>>>> non-zero callgraph edges as guessed. Finally, when one such 0-count
>>>>>>>> comdat is inlined the call count is propagated to the callee blocks
>>>>>>>> using the guessed probabilities.
>>>>>>>>
>>>>>>>> However, in this case, there was a comdat that had a very small
>>>>>>>> non-zero count, that was being inlined to a much hotter callsite. This
>>>>>>>> could happen when there was a copy that was ipa-inlined
>>>>>>>> in the profile gen compile, so the copy in that module gets some
>>>>>>>> non-zero counts from the ipa inlined instance, but when the out of
>>>>>>>> line copy was eliminated by the linker (selected from a different
>>>>>>>> module). In this case the inliner was scaling the bb counts up quite a
>>>>>>>> lot when inlining. The problem is that you most likely can't trust
>>>>>>>> that the 0 count bbs in such a case are really not executed by the
>>>>>>>> callsite it is being into, since the counts are very small and
>>>>>>>> correspond to a different callsite. In some internal C++ applications
>>>>>>>> I am seeing that we execute out of the split cold portion of COMDATs
>>>>>>>> for this reason.
>>>>>>>>
>>>>>>>> This problem is more complicated to address than the 0-count instance,
>>>>>>>> because we need the cgraph to determine which functions to drop the
>>>>>>>> profile on, and at that point the estimated frequencies have already
>>>>>>>> been overwritten by counts_to_freqs. To handle this broader case, I
>>>>>>>> have changed the drop_profile routine to simply re-estimate the
>>>>>>>> probabilities/frequencies (and translate these into counts scaled from
>>>>>>>> the incoming call edge counts). This unfortunately necessitates
>>>>>>>> rebuilding the cgraph, to propagate the new synthesized counts and
>>>>>>>> avoid checking failures downstream. But it will only be rebuilt if we
>>>>>>>> dropped any profiles. With this solution, some of the older approach
>>>>>>>> can be removed (e.g. propagating counts using the guessed
>>>>>>>> probabilities during inlining).
>>>>>>>>
>>>>>>>> Patch is below. Bootstrapped and tested on x86-64-unknown-linux-gnu.
>>>>>>>> Also tested on
>>>>>>>> a profile-use build of SPEC cpu2006. Ok for trunk when stage 1 reopens?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Teresa
>>>>>>>>
>>>>>>>> 2014-02-10 Teresa Johnson <[email protected]>
>>>>>>>>
>>>>>>>> * graphite.c (graphite_finalize): Pass new parameter.
>>>>>>>> * params.def (PARAM_MIN_CALLER_REESTIMATE_RATIO): New.
>>>>>>>> * predict.c (tree_estimate_probability): New parameter.
>>>>>>>> (tree_estimate_probability_worker): Renamed from
>>>>>>>> tree_estimate_probability_driver.
>>>>>>>> (tree_reestimate_probability): New function.
>>>>>>>> (tree_estimate_probability_driver): Invoke
>>>>>>>> tree_estimate_probability_worker.
>>>>>>>> (freqs_to_counts): Move from tree-inline.c.
>>>>>>>> (drop_profile): Re-estimated profiles when dropping counts.
>>>>>>>> (handle_missing_profiles): Drop for some non-zero functions as
>>>>>>>> well.
>>>>>>>> (counts_to_freqs): Remove code obviated by reestimation.
>>>>>>>> * predict.h (handle_missing_profiles): Update declartion.
>>>>>>>> (tree_estimate_probability): Ditto.
>>>>>>>> * tree-inline.c (freqs_to_counts): Move to predict.c.
>>>>>>>> (copy_cfg_body): Remove code obviated by reestimation.
>>>>>>>> * tree-profile.c (gimple_gen_ior_profiler):
>>>>>>>> (rebuild_cgraph): Code extracted from tree_profiling to
>>>>>>>> rebuild cgraph.
>>>>>>>> (tree_profiling): Invoke rebuild_cgraph as needed.
>>>>>>>>
>>>>>>>> Index: graphite.c
>>>>>>>> ===================================================================
>>>>>>>> --- graphite.c (revision 207436)
>>>>>>>> +++ graphite.c (working copy)
>>>>>>>> @@ -247,7 +247,7 @@ graphite_finalize (bool need_cfg_cleanup_p)
>>>>>>>> cleanup_tree_cfg ();
>>>>>>>> profile_status_for_fn (cfun) = PROFILE_ABSENT;
>>>>>>>> release_recorded_exits ();
>>>>>>>> - tree_estimate_probability ();
>>>>>>>> + tree_estimate_probability (false);
>>>>>>>> }
>>>>>>>>
>>>>>>>> cloog_state_free (cloog_state);
>>>>>>>> Index: params.def
>>>>>>>> ===================================================================
>>>>>>>> --- params.def (revision 207436)
>>>>>>>> +++ params.def (working copy)
>>>>>>>> @@ -44,6 +44,12 @@ DEFPARAM (PARAM_PREDICTABLE_BRANCH_OUTCOME,
>>>>>>>> "Maximal estimated outcome of branch considered predictable",
>>>>>>>> 2, 0, 50)
>>>>>>>>
>>>>>>>> +DEFPARAM (PARAM_MIN_CALLER_REESTIMATE_RATIO,
>>>>>>>> + "min-caller-reestimate-ratio",
>>>>>>>> + "Minimum caller-to-callee node count ratio to force
>>>>>>>> reestimated branch "
>>>>>>>> + "probabilities in callee (where 0 means only when callee
>>>>>>>> count is 0)",
>>>>>>>> + 10, 0, 0)
>>>>>>>> +
>>>>>>>> DEFPARAM (PARAM_INLINE_MIN_SPEEDUP,
>>>>>>>> "inline-min-speedup",
>>>>>>>> "The minimal estimated speedup allowing inliner to ignore
>>>>>>>> inline-insns-single and inline-isnsns-auto",
>>>>>>>> Index: predict.c
>>>>>>>> ===================================================================
>>>>>>>> --- predict.c (revision 207436)
>>>>>>>> +++ predict.c (working copy)
>>>>>>>> @@ -2379,10 +2379,12 @@ tree_estimate_probability_bb (basic_block bb)
>>>>>>>>
>>>>>>>> /* Predict branch probabilities and estimate profile of the tree CFG.
>>>>>>>> This function can be called from the loop optimizers to recompute
>>>>>>>> - the profile information. */
>>>>>>>> + the profile information. When REDO is true then we are forcing
>>>>>>>> + re-estimation of the probabilities because the profile was deemed
>>>>>>>> + insufficient. */
>>>>>>>>
>>>>>>>> void
>>>>>>>> -tree_estimate_probability (void)
>>>>>>>> +tree_estimate_probability (bool redo)
>>>>>>>> {
>>>>>>>> basic_block bb;
>>>>>>>>
>>>>>>>> @@ -2390,7 +2392,8 @@ void
>>>>>>>> connect_infinite_loops_to_exit ();
>>>>>>>> /* We use loop_niter_by_eval, which requires that the loops have
>>>>>>>> preheaders. */
>>>>>>>> - create_preheaders (CP_SIMPLE_PREHEADERS);
>>>>>>>> + if (!redo)
>>>>>>>> + create_preheaders (CP_SIMPLE_PREHEADERS);
>>>>>>>> calculate_dominance_info (CDI_POST_DOMINATORS);
>>>>>>>>
>>>>>>>> bb_predictions = pointer_map_create ();
>>>>>>>> @@ -2412,16 +2415,16 @@ void
>>>>>>>> pointer_map_destroy (bb_predictions);
>>>>>>>> bb_predictions = NULL;
>>>>>>>>
>>>>>>>> - estimate_bb_frequencies (false);
>>>>>>>> + estimate_bb_frequencies (redo);
>>>>>>>> free_dominance_info (CDI_POST_DOMINATORS);
>>>>>>>> remove_fake_exit_edges ();
>>>>>>>> }
>>>>>>>>
>>>>>>>> /* Predict branch probabilities and estimate profile of the tree CFG.
>>>>>>>> - This is the driver function for PASS_PROFILE. */
>>>>>>>> + When REDO is true, we are forcing reestimation of the
>>>>>>>> probabilities. */
>>>>>>>>
>>>>>>>> -static unsigned int
>>>>>>>> -tree_estimate_probability_driver (void)
>>>>>>>> +static void
>>>>>>>> +tree_estimate_probability_worker (bool redo)
>>>>>>>> {
>>>>>>>> unsigned nb_loops;
>>>>>>>>
>>>>>>>> @@ -2435,7 +2438,7 @@ void
>>>>>>>> if (nb_loops > 1)
>>>>>>>> scev_initialize ();
>>>>>>>>
>>>>>>>> - tree_estimate_probability ();
>>>>>>>> + tree_estimate_probability (redo);
>>>>>>>>
>>>>>>>> if (nb_loops > 1)
>>>>>>>> scev_finalize ();
>>>>>>>> @@ -2445,6 +2448,34 @@ void
>>>>>>>> gimple_dump_cfg (dump_file, dump_flags);
>>>>>>>> if (profile_status_for_fn (cfun) == PROFILE_ABSENT)
>>>>>>>> profile_status_for_fn (cfun) = PROFILE_GUESSED;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* Force re-estimation of the probabilities, because the profile was
>>>>>>>> + deemed insufficient. */
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +tree_reestimate_probability (void)
>>>>>>>> +{
>>>>>>>> + basic_block bb;
>>>>>>>> + edge e;
>>>>>>>> + edge_iterator ei;
>>>>>>>> +
>>>>>>>> + /* Need to reset the counts to ensure probabilities are recomputed.
>>>>>>>> */
>>>>>>>> + FOR_EACH_BB_FN (bb, cfun)
>>>>>>>> + {
>>>>>>>> + bb->count = 0;
>>>>>>>> + FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>>>> + e->count = 0;
>>>>>>>> + }
>>>>>>>> + tree_estimate_probability_worker (true);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* Estimate probabilities.
>>>>>>>> + This is the driver function for PASS_PROFILE. */
>>>>>>>> +static unsigned int
>>>>>>>> +tree_estimate_probability_driver (void)
>>>>>>>> +{
>>>>>>>> + tree_estimate_probability_worker (false);
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>> ^L
>>>>>>>> @@ -2765,6 +2796,28 @@ estimate_loops (void)
>>>>>>>> BITMAP_FREE (tovisit);
>>>>>>>> }
>>>>>>>>
>>>>>>>> +/* Convert estimated frequencies into counts for NODE, scaling COUNT
>>>>>>>> + with each bb's frequency. Used when NODE has an entry count that
>>>>>>>> + is much lower than the caller edges reaching it. See the comments
>>>>>>>> + for handle_missing_profiles() for when this can happen for
>>>>>>>> COMDATs. */
>>>>>>>> +
>>>>>>>> +void
>>>>>>>> +freqs_to_counts (struct cgraph_node *node, gcov_type count)
>>>>>>>> +{
>>>>>>>> + basic_block bb;
>>>>>>>> + edge_iterator ei;
>>>>>>>> + edge e;
>>>>>>>> + struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>>>>>> +
>>>>>>>> + FOR_ALL_BB_FN(bb, fn)
>>>>>>>> + {
>>>>>>>> + bb->count = apply_scale (count,
>>>>>>>> + GCOV_COMPUTE_SCALE (bb->frequency,
>>>>>>>> BB_FREQ_MAX));
>>>>>>>> + FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>>>> + e->count = apply_probability (e->src->count, e->probability);
>>>>>>>> + }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> /* Drop the profile for NODE to guessed, and update its frequency
>>>>>>>> based on
>>>>>>>> whether it is expected to be hot given the CALL_COUNT. */
>>>>>>>>
>>>>>>>> @@ -2772,6 +2825,9 @@ static void
>>>>>>>> drop_profile (struct cgraph_node *node, gcov_type call_count)
>>>>>>>> {
>>>>>>>> struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>>>>>> +
>>>>>>>> + if (profile_status_for_fn (fn) == PROFILE_GUESSED)
>>>>>>>> + return;
>>>>>>>> /* In the case where this was called by another function with a
>>>>>>>> dropped profile, call_count will be 0. Since there are no
>>>>>>>> non-zero call counts to this function, we don't know for sure
>>>>>>>> @@ -2780,7 +2836,8 @@ drop_profile (struct cgraph_node *node, gcov_type
>>>>>>>>
>>>>>>>> if (dump_file)
>>>>>>>> fprintf (dump_file,
>>>>>>>> - "Dropping 0 profile for %s/%i. %s based on calls.\n",
>>>>>>>> + "Dropping %ld profile for %s/%i. %s based on calls.\n",
>>>>>>>> + node->count,
>>>>>>>> node->name (), node->order,
>>>>>>>> hot ? "Function is hot" : "Function is normal");
>>>>>>>> /* We only expect to miss profiles for functions that are reached
>>>>>>>> @@ -2806,6 +2863,18 @@ drop_profile (struct cgraph_node *node,
>>>>>>>> gcov_type
>>>>>>>> node->name (), node->order);
>>>>>>>> }
>>>>>>>>
>>>>>>>> + /* Re-estimate the probabilities for function and use the estimated
>>>>>>>> + frequencies to compute the counts. */
>>>>>>>> + push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>>>>>>>> + tree_reestimate_probability ();
>>>>>>>> + freqs_to_counts (node, call_count);
>>>>>>>> + if (dump_file)
>>>>>>>> + {
>>>>>>>> + fprintf (dump_file, "After re-estimating probabilies and
>>>>>>>> counts\n");
>>>>>>>> + gimple_dump_cfg (dump_file,
>>>>>>>> dump_flags|TDF_DETAILS|TDF_BLOCKS|TDF_LINENO|TDF_STATS);
>>>>>>>> + }
>>>>>>>> + pop_cfun ();
>>>>>>>> +
>>>>>>>> profile_status_for_fn (fn)
>>>>>>>> = (flag_guess_branch_prob ? PROFILE_GUESSED : PROFILE_ABSENT);
>>>>>>>> node->frequency
>>>>>>>> @@ -2815,26 +2884,37 @@ drop_profile (struct cgraph_node *node,
>>>>>>>> gcov_type
>>>>>>>> /* In the case of COMDAT routines, multiple object files will contain
>>>>>>>> the same
>>>>>>>> function and the linker will select one for the binary. In that
>>>>>>>> case
>>>>>>>> all the other copies from the profile instrument binary will be
>>>>>>>> missing
>>>>>>>> - profile counts. Look for cases where this happened, due to non-zero
>>>>>>>> + profile counts. This can confuse downstream optimizations such as
>>>>>>>> + function splitting.
>>>>>>>> +
>>>>>>>> + Look for cases where this happened, due to non-zero
>>>>>>>> call counts going to 0-count functions, and drop the profile to
>>>>>>>> guessed
>>>>>>>> so that we can use the estimated probabilities and avoid
>>>>>>>> optimizing only
>>>>>>>> - for size.
>>>>>>>> + for size. In the case where the COMDAT was inlined in some
>>>>>>>> locations
>>>>>>>> + within the file and not others, the profile count will be non-zero
>>>>>>>> due
>>>>>>>> + to the inlined instances, but may still be significantly smaller
>>>>>>>> than the
>>>>>>>> + call edges for the non-inlined instances. Detect that case when
>>>>>>>> requested
>>>>>>>> + and reestimate probabilities, since the counts will not
>>>>>>>> necessarily reflect
>>>>>>>> + the behavior along the more frequent call paths.
>>>>>>>>
>>>>>>>> The other case where the profile may be missing is when the routine
>>>>>>>> is not going to be emitted to the object file, e.g. for "extern
>>>>>>>> template"
>>>>>>>> class methods. Those will be marked DECL_EXTERNAL. Emit a warning
>>>>>>>> in
>>>>>>>> all other cases of non-zero calls to 0-count functions. */
>>>>>>>>
>>>>>>>> -void
>>>>>>>> +bool
>>>>>>>> handle_missing_profiles (void)
>>>>>>>> {
>>>>>>>> struct cgraph_node *node;
>>>>>>>> int unlikely_count_fraction = PARAM_VALUE
>>>>>>>> (UNLIKELY_BB_COUNT_FRACTION);
>>>>>>>> vec<struct cgraph_node *> worklist;
>>>>>>>> worklist.create (64);
>>>>>>>> + int min_reest_ratio = PARAM_VALUE
>>>>>>>> (PARAM_MIN_CALLER_REESTIMATE_RATIO);
>>>>>>>> + bool changed = false;
>>>>>>>>
>>>>>>>> - /* See if 0 count function has non-0 count callers. In this case we
>>>>>>>> - lost some profile. Drop its function profile to
>>>>>>>> PROFILE_GUESSED. */
>>>>>>>> + /* See if 0 or low count function has higher count caller edges.
>>>>>>>> In this
>>>>>>>> + case we lost some profile. Drop its function profile to
>>>>>>>> + PROFILE_GUESSED. */
>>>>>>>> FOR_EACH_DEFINED_FUNCTION (node)
>>>>>>>> {
>>>>>>>> struct cgraph_edge *e;
>>>>>>>> @@ -2842,8 +2922,6 @@ handle_missing_profiles (void)
>>>>>>>> gcov_type max_tp_first_run = 0;
>>>>>>>> struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>>>>>>
>>>>>>>> - if (node->count)
>>>>>>>> - continue;
>>>>>>>> for (e = node->callers; e; e = e->next_caller)
>>>>>>>> {
>>>>>>>> call_count += e->count;
>>>>>>>> @@ -2852,6 +2930,12 @@ handle_missing_profiles (void)
>>>>>>>> max_tp_first_run = e->caller->tp_first_run;
>>>>>>>> }
>>>>>>>>
>>>>>>>> + /* When the PARAM_MIN_CALLER_REESTIMATE_RATIO is 0, then we
>>>>>>>> only drop
>>>>>>>> + profiles for 0-count functions called by non-zero call
>>>>>>>> edges. */
>>>>>>>> + if ((!min_reest_ratio && node->count > 0)
>>>>>>>> + || (min_reest_ratio && node->count * min_reest_ratio >
>>>>>>>> call_count))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> /* If time profile is missing, let assign the maximum that
>>>>>>>> comes from
>>>>>>>> caller functions. */
>>>>>>>> if (!node->tp_first_run && max_tp_first_run)
>>>>>>>> @@ -2862,11 +2946,12 @@ handle_missing_profiles (void)
>>>>>>>> && (call_count * unlikely_count_fraction >=
>>>>>>>> profile_info->runs))
>>>>>>>> {
>>>>>>>> drop_profile (node, call_count);
>>>>>>>> + changed = true;
>>>>>>>> worklist.safe_push (node);
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> - /* Propagate the profile dropping to other 0-count COMDATs that are
>>>>>>>> + /* Propagate the profile dropping to other low-count COMDATs that
>>>>>>>> are
>>>>>>>> potentially called by COMDATs we already dropped the profile on.
>>>>>>>> */
>>>>>>>> while (worklist.length () > 0)
>>>>>>>> {
>>>>>>>> @@ -2878,17 +2963,33 @@ handle_missing_profiles (void)
>>>>>>>> struct cgraph_node *callee = e->callee;
>>>>>>>> struct function *fn = DECL_STRUCT_FUNCTION (callee->decl);
>>>>>>>>
>>>>>>>> - if (callee->count > 0)
>>>>>>>> + /* When min_reest_ratio is non-zero, if we get here we
>>>>>>>> dropped
>>>>>>>> + a caller's profile since it was significantly smaller
>>>>>>>> than its
>>>>>>>> + call edge. Drop the profile on any callees whose node
>>>>>>>> count is
>>>>>>>> + now exceeded by the new caller node count. */
>>>>>>>> + if ((!min_reest_ratio && callee->count > 0)
>>>>>>>> + || (min_reest_ratio && callee->count >= node->count))
>>>>>>>> continue;
>>>>>>>> +
>>>>>>>> + gcov_type call_count = 0;
>>>>>>>> + if (min_reest_ratio > 0)
>>>>>>>> + {
>>>>>>>> + struct cgraph_edge *e2;
>>>>>>>> + for (e2 = node->callers; e2; e2 = e2->next_caller)
>>>>>>>> + call_count += e2->count;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> if (DECL_COMDAT (callee->decl) && fn && fn->cfg
>>>>>>>> && profile_status_for_fn (fn) == PROFILE_READ)
>>>>>>>> {
>>>>>>>> - drop_profile (node, 0);
>>>>>>>> + drop_profile (node, call_count);
>>>>>>>> + changed = true;
>>>>>>>> worklist.safe_push (callee);
>>>>>>>> }
>>>>>>>> }
>>>>>>>> }
>>>>>>>> worklist.release ();
>>>>>>>> + return changed;
>>>>>>>> }
>>>>>>>>
>>>>>>>> /* Convert counts measured by profile driven feedback to frequencies.
>>>>>>>> @@ -2900,12 +3001,6 @@ counts_to_freqs (void)
>>>>>>>> gcov_type count_max, true_count_max = 0;
>>>>>>>> basic_block bb;
>>>>>>>>
>>>>>>>> - /* Don't overwrite the estimated frequencies when the profile for
>>>>>>>> - the function is missing. We may drop this function
>>>>>>>> PROFILE_GUESSED
>>>>>>>> - later in drop_profile (). */
>>>>>>>> - if (!ENTRY_BLOCK_PTR_FOR_FN (cfun)->count)
>>>>>>>> - return 0;
>>>>>>>> -
>>>>>>>> FOR_BB_BETWEEN (bb, ENTRY_BLOCK_PTR_FOR_FN (cfun), NULL, next_bb)
>>>>>>>> true_count_max = MAX (bb->count, true_count_max);
>>>>>>>>
>>>>>>>> Index: predict.h
>>>>>>>> ===================================================================
>>>>>>>> --- predict.h (revision 207436)
>>>>>>>> +++ predict.h (working copy)
>>>>>>>> @@ -47,11 +47,11 @@ enum prediction
>>>>>>>>
>>>>>>>> extern void predict_insn_def (rtx, enum br_predictor, enum
>>>>>>>> prediction);
>>>>>>>> extern int counts_to_freqs (void);
>>>>>>>> -extern void handle_missing_profiles (void);
>>>>>>>> +extern bool handle_missing_profiles (void);
>>>>>>>> extern void estimate_bb_frequencies (bool);
>>>>>>>> extern const char *predictor_name (enum br_predictor);
>>>>>>>> extern tree build_predict_expr (enum br_predictor, enum prediction);
>>>>>>>> -extern void tree_estimate_probability (void);
>>>>>>>> +extern void tree_estimate_probability (bool);
>>>>>>>> extern void compute_function_frequency (void);
>>>>>>>> extern void rebuild_frequencies (void);
>>>>>>>>
>>>>>>>> Index: tree-inline.c
>>>>>>>> ===================================================================
>>>>>>>> --- tree-inline.c (revision 207436)
>>>>>>>> +++ tree-inline.c (working copy)
>>>>>>>> @@ -2384,29 +2384,6 @@ redirect_all_calls (copy_body_data * id,
>>>>>>>> basic_blo
>>>>>>>> }
>>>>>>>> }
>>>>>>>>
>>>>>>>> -/* Convert estimated frequencies into counts for NODE, scaling COUNT
>>>>>>>> - with each bb's frequency. Used when NODE has a 0-weight entry
>>>>>>>> - but we are about to inline it into a non-zero count call bb.
>>>>>>>> - See the comments for handle_missing_profiles() in predict.c for
>>>>>>>> - when this can happen for COMDATs. */
>>>>>>>> -
>>>>>>>> -void
>>>>>>>> -freqs_to_counts (struct cgraph_node *node, gcov_type count)
>>>>>>>> -{
>>>>>>>> - basic_block bb;
>>>>>>>> - edge_iterator ei;
>>>>>>>> - edge e;
>>>>>>>> - struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
>>>>>>>> -
>>>>>>>> - FOR_ALL_BB_FN(bb, fn)
>>>>>>>> - {
>>>>>>>> - bb->count = apply_scale (count,
>>>>>>>> - GCOV_COMPUTE_SCALE (bb->frequency,
>>>>>>>> BB_FREQ_MAX));
>>>>>>>> - FOR_EACH_EDGE (e, ei, bb->succs)
>>>>>>>> - e->count = apply_probability (e->src->count, e->probability);
>>>>>>>> - }
>>>>>>>> -}
>>>>>>>> -
>>>>>>>> /* Make a copy of the body of FN so that it can be inserted inline in
>>>>>>>> another function. Walks FN via CFG, returns new fndecl. */
>>>>>>>>
>>>>>>>> @@ -2427,24 +2404,6 @@ copy_cfg_body (copy_body_data * id, gcov_type
>>>>>>>> coun
>>>>>>>> int incoming_frequency = 0;
>>>>>>>> gcov_type incoming_count = 0;
>>>>>>>>
>>>>>>>> - /* This can happen for COMDAT routines that end up with 0 counts
>>>>>>>> - despite being called (see the comments for
>>>>>>>> handle_missing_profiles()
>>>>>>>> - in predict.c as to why). Apply counts to the blocks in the callee
>>>>>>>> - before inlining, using the guessed edge frequencies, so that we
>>>>>>>> don't
>>>>>>>> - end up with a 0-count inline body which can confuse downstream
>>>>>>>> - optimizations such as function splitting. */
>>>>>>>> - if (!ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count && count)
>>>>>>>> - {
>>>>>>>> - /* Apply the larger of the call bb count and the total incoming
>>>>>>>> - call edge count to the callee. */
>>>>>>>> - gcov_type in_count = 0;
>>>>>>>> - struct cgraph_edge *in_edge;
>>>>>>>> - for (in_edge = id->src_node->callers; in_edge;
>>>>>>>> - in_edge = in_edge->next_caller)
>>>>>>>> - in_count += in_edge->count;
>>>>>>>> - freqs_to_counts (id->src_node, count > in_count ? count :
>>>>>>>> in_count);
>>>>>>>> - }
>>>>>>>> -
>>>>>>>> if (ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count)
>>>>>>>> count_scale
>>>>>>>> = GCOV_COMPUTE_SCALE (count,
>>>>>>>> @@ -2452,6 +2411,13 @@ copy_cfg_body (copy_body_data * id, gcov_type
>>>>>>>> coun
>>>>>>>> else
>>>>>>>> count_scale = REG_BR_PROB_BASE;
>>>>>>>>
>>>>>>>> + if (dump_file && (dump_flags & TDF_DETAILS))
>>>>>>>> + fprintf (dump_file,
>>>>>>>> + "Scaling entry count %ld to %ld with scale %ld while
>>>>>>>> inlining "
>>>>>>>> + "%s into %s\n",
>>>>>>>> + count, ENTRY_BLOCK_PTR_FOR_FN (src_cfun)->count,
>>>>>>>> count_scale,
>>>>>>>> + id->src_node->name (), id->dst_node->name ());
>>>>>>>> +
>>>>>>>> /* Register specific tree functions. */
>>>>>>>> gimple_register_cfg_hooks ();
>>>>>>>>
>>>>>>>> Index: tree-profile.c
>>>>>>>> ===================================================================
>>>>>>>> --- tree-profile.c (revision 207436)
>>>>>>>> +++ tree-profile.c (working copy)
>>>>>>>> @@ -558,6 +558,52 @@ gimple_gen_ior_profiler (histogram_value value, un
>>>>>>>> gsi_insert_before (&gsi, call, GSI_NEW_STMT);
>>>>>>>> }
>>>>>>>>
>>>>>>>> +/* Update call statements when UPDATE_CALLS, and rebuild the cgraph
>>>>>>>> edges. */
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +rebuild_cgraph (bool update_calls)
>>>>>>>> +{
>>>>>>>> + struct cgraph_node *node;
>>>>>>>> +
>>>>>>>> + FOR_EACH_DEFINED_FUNCTION (node)
>>>>>>>> + {
>>>>>>>> + basic_block bb;
>>>>>>>> +
>>>>>>>> + if (!gimple_has_body_p (node->decl)
>>>>>>>> + || !(!node->clone_of
>>>>>>>> + || node->decl != node->clone_of->decl))
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + /* Don't profile functions produced for builtin stuff. */
>>>>>>>> + if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
>>>>>>>> + continue;
>>>>>>>> +
>>>>>>>> + push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>>>>>>>> +
>>>>>>>> + if (update_calls)
>>>>>>>> + {
>>>>>>>> + FOR_EACH_BB_FN (bb, cfun)
>>>>>>>> + {
>>>>>>>> + gimple_stmt_iterator gsi;
>>>>>>>> + for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
>>>>>>>> gsi_next (&gsi))
>>>>>>>> + {
>>>>>>>> + gimple stmt = gsi_stmt (gsi);
>>>>>>>> + if (is_gimple_call (stmt))
>>>>>>>> + update_stmt (stmt);
>>>>>>>> + }
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + /* re-merge split blocks. */
>>>>>>>> + cleanup_tree_cfg ();
>>>>>>>> + update_ssa (TODO_update_ssa);
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + rebuild_cgraph_edges ();
>>>>>>>> +
>>>>>>>> + pop_cfun ();
>>>>>>>> + }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> /* Profile all functions in the callgraph. */
>>>>>>>>
>>>>>>>> static unsigned int
>>>>>>>> @@ -622,43 +668,14 @@ tree_profiling (void)
>>>>>>>> }
>>>>>>>>
>>>>>>>> /* Update call statements and rebuild the cgraph. */
>>>>>>>> - FOR_EACH_DEFINED_FUNCTION (node)
>>>>>>>> - {
>>>>>>>> - basic_block bb;
>>>>>>>> + rebuild_cgraph (true);
>>>>>>>>
>>>>>>>> - if (!gimple_has_body_p (node->decl)
>>>>>>>> - || !(!node->clone_of
>>>>>>>> - || node->decl != node->clone_of->decl))
>>>>>>>> - continue;
>>>>>>>> + /* If the profiles were dropped on any functions, unfortunately we
>>>>>>>> + need to rebuild the cgraph to propagate the new reestimated
>>>>>>>> counts
>>>>>>>> + and avoid sanity failures due to inconsistencies. */
>>>>>>>> + if (handle_missing_profiles ())
>>>>>>>> + rebuild_cgraph (false);
>>>>>>>>
>>>>>>>> - /* Don't profile functions produced for builtin stuff. */
>>>>>>>> - if (DECL_SOURCE_LOCATION (node->decl) == BUILTINS_LOCATION)
>>>>>>>> - continue;
>>>>>>>> -
>>>>>>>> - push_cfun (DECL_STRUCT_FUNCTION (node->decl));
>>>>>>>> -
>>>>>>>> - FOR_EACH_BB_FN (bb, cfun)
>>>>>>>> - {
>>>>>>>> - gimple_stmt_iterator gsi;
>>>>>>>> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next
>>>>>>>> (&gsi))
>>>>>>>> - {
>>>>>>>> - gimple stmt = gsi_stmt (gsi);
>>>>>>>> - if (is_gimple_call (stmt))
>>>>>>>> - update_stmt (stmt);
>>>>>>>> - }
>>>>>>>> - }
>>>>>>>> -
>>>>>>>> - /* re-merge split blocks. */
>>>>>>>> - cleanup_tree_cfg ();
>>>>>>>> - update_ssa (TODO_update_ssa);
>>>>>>>> -
>>>>>>>> - rebuild_cgraph_edges ();
>>>>>>>> -
>>>>>>>> - pop_cfun ();
>>>>>>>> - }
>>>>>>>> -
>>>>>>>> - handle_missing_profiles ();
>>>>>>>> -
>>>>>>>> del_node_map ();
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Teresa Johnson | Software Engineer | [email protected] |
>>>>>>>> 408-460-2413
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Teresa Johnson | Software Engineer | [email protected] | 408-460-2413
>>>>
>>>>
>>>>
>>>> --
>>>> Teresa Johnson | Software Engineer | [email protected] | 408-460-2413
>>
>>
>>
>> --
>> Teresa Johnson | Software Engineer | [email protected] | 408-460-2413
>
>
>
> --
> Teresa Johnson | Software Engineer | [email protected] | 408-460-2413