I've updated the patch to check it at ipa-inline:
Index: gcc/ipa-inline.c
===================================================================
--- gcc/ipa-inline.c (revision 199593)
+++ gcc/ipa-inline.c (working copy)
@@ -434,6 +434,16 @@ want_early_inline_function_p (struct cgraph_edge *
if (growth <= PARAM_VALUE (PARAM_EARLY_INLINING_INSNS_ANY))
;
+ else if (flag_auto_profile)
+ {
+ if (dump_file)
+ fprintf (dump_file, " will not early inline: %s/%i->%s/%i, "
+ "call is cold in profiling and code would grow by %i\n",
+ xstrdup (cgraph_node_name (e->caller)), e->caller->uid,
+ xstrdup (cgraph_node_name (callee)), callee->uid,
+ growth);
+ want_inline = false;
+ }
else if (!cgraph_maybe_hot_edge_p (e))
{
if (dump_file)
Thanks,
Dehao
On Sun, Jun 2, 2013 at 9:08 PM, Xinliang David Li <[email protected]> wrote:
> If the purpose of the fix is to filter early inlinings with code
> growth in autoFDO, the proposed fix is the wrong way to do -- it
> changes the meaning of cgraph_maybe_hot_edge_p.
>
> David
>
> On Sun, Jun 2, 2013 at 7:25 PM, Dehao Chen <[email protected]> wrote:
>> On Sun, Jun 2, 2013 at 7:14 PM, Xinliang David Li <[email protected]> wrote:
>>>
>>> auto profile info is not available yet in early inlining, why would
>>> this change make any difference?
>>
>> Because the check of PARAM_EARLY_INLINING_INSNS is after the check of
>> cgraph_maybe_hot_edge_p in early inline. If
>> cgraph_maybe_hot_edge_p fails, the early inline will not happen even
>> if growth is less than PARAM_EARLY_INLINING_INSNS.
>>
>>>
>>> Can you just reset the max_iters to a
>>> higher value for autoFDO?
>>
>> We could do that, but it could still lead to some code bloat because
>> recursive inlines can happen for at most, say 10, iterations.
>>
>> Dehao
>>
>>>
>>> David
>>>
>>> On Sun, Jun 2, 2013 at 6:21 PM, Dehao Chen <[email protected]> wrote:
>>> > The patch was committed to google-4_8, but it causes problem because
>>> > einline sets PARAM_EARLY_INLINING_INSNS = 11. This will cause
>>> > recursive inlining at einline stage (e.g. main->foo, foo->bar,
>>> > bar->foo) when autofdo is enabled.
>>> >
>>> > The following patch can fix the problem by doing more targetted early
>>> > inlining:
>>> >
>>> > Index: gcc/predict.c
>>> > ===================================================================
>>> > --- gcc/predict.c (revision 199593)
>>> > +++ gcc/predict.c (working copy)
>>> > @@ -175,6 +175,8 @@ cgraph_maybe_hot_edge_p (struct cgraph_edge *edge)
>>> > && !maybe_hot_count_p (NULL,
>>> > edge->count))
>>> > return false;
>>> > + if (flag_auto_profile)
>>> > + return false;
>>> > if (edge->caller->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED
>>> > || (edge->callee
>>> > && edge->callee->frequency == NODE_FREQUENCY_UNLIKELY_EXECUTED))
>>> >
>>> > Performance testing on-going...
>>> >
>>> > Dehao
>>> >
>>> > On Wed, May 29, 2013 at 3:44 PM, Dehao Chen <[email protected]> wrote:
>>> >> OK, I'll commit the early inline part.
>>> >>
>>> >> Dehao
>>> >>
>>> >> On Wed, May 29, 2013 at 10:00 AM, Xinliang David Li <[email protected]>
>>> >> wrote:
>>> >>> The early inlining part is ok. The tracer optimization should be
>>> >>> revisited -- we should have more fine grain control on it (for
>>> >>> instance, based on FDO summary -- but that should be common to
>>> >>> FDO/LIPO).
>>> >>>
>>> >>> David
>>> >>>
>>> >>> On Wed, May 29, 2013 at 9:39 AM, Dehao Chen <[email protected]> wrote:
>>> >>>> In gcc4-8, the max einline iterations are restricted to 1. For
>>> >>>> AutoFDO, this is bad because early inline is not size restricted. This
>>> >>>> patch allows einline to do multiple iterations in AutoFDO. It also
>>> >>>> enables tracer optimization in AutoFDO.
>>> >>>>
>>> >>>> Bootstrapped and passed regression test.
>>> >>>>
>>> >>>> OK for googel-4_8?
>>> >>>>
>>> >>>> Thanks,
>>> >>>> Dehao
>>> >>>>
>>> >>>> Index: gcc/ipa-inline.c
>>> >>>> ===================================================================
>>> >>>> --- gcc/ipa-inline.c (revision 199416)
>>> >>>> +++ gcc/ipa-inline.c (working copy)
>>> >>>> @@ -2161,7 +2161,8 @@ early_inliner (void)
>>> >>>> {
>>> >>>> /* We iterate incremental inlining to get trivial cases of
>>> >>>> indirect
>>> >>>> inlining. */
>>> >>>> - while (iterations < PARAM_VALUE
>>> >>>> (PARAM_EARLY_INLINER_MAX_ITERATIONS)
>>> >>>> + while ((flag_auto_profile
>>> >>>> + || iterations < PARAM_VALUE
>>> >>>> (PARAM_EARLY_INLINER_MAX_ITERATIONS))
>>> >>>> && early_inline_small_functions (node))
>>> >>>> {
>>> >>>> timevar_push (TV_INTEGRATION);
>>> >>>> Index: gcc/opts.c
>>> >>>> ===================================================================
>>> >>>> --- gcc/opts.c (revision 199416)
>>> >>>> +++ gcc/opts.c (working copy)
>>> >>>> @@ -1644,6 +1644,8 @@ common_handle_option (struct gcc_options *opts,
>>> >>>> opts->x_flag_peel_loops = value;
>>> >>>> if (!opts_set->x_flag_value_profile_transformations)
>>> >>>> opts->x_flag_value_profile_transformations = value;
>>> >>>> + if (!opts_set->x_flag_tracer)
>>> >>>> + opts->x_flag_tracer = value;
>>> >>>> if (!opts_set->x_flag_inline_functions)
>>> >>>> opts->x_flag_inline_functions = value;
>>> >>>> if (!opts_set->x_flag_ipa_cp)