Hi Martin,

> Hi,
> 
> On Thu, Aug 20 2020, Richard Sandiford wrote:
> >>
> >>
> >> Really appreciate for your detailed explanation.  BTW, My previous
> >> patch for PGO build on exchange2 takes this similar method by setting
> >> each cloned node to 1/10th of the frequency several month agao :)
> >>
> >> https://gcc.gnu.org/pipermail/gcc-patches/2020-June/546926.html
> >
> > Does it seem likely that we'll reach a resolution on this soon?
> > I take the point that the patch that introduced the exchange
> > regression
> > [https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551757.html]
> > was just uncovering a latent issue, but still, this is a large
> > regression in an important benchmark to be carrying around.  For those
> > of us doing regular benchmark CI, the longer the performance trough
> > gets, the harder it is to spot other unrelated regressions in the “properly
> optimised” code.
> >
> > So if we don't have a specific plan for fixing the regression soon, I
> > think we should consider reverting the patch until we have something
> > that avoids the exchange regression, even though the patch wasn't
> > technically wrong.
> 
> Honza's changes have been motivated to big extent as an enabler for IPA-CP
> heuristics changes to actually speed up 548.exchange2_r.
> 
> On my AMD Zen2 machine, the run-time of exchange2 was 358 seconds two
> weeks ago, this week it is 403, but with my WIP (and so far untested) patch
> below it is just 276 seconds - faster than one built with GCC 8 which needs
> 283 seconds.
> 
> I'll be interested in knowing if it also works this well on other 
> architectures.
> 

Many thanks for working on this!

I tried this on an AArch64 Neoverse-N1 machine and didn't see any difference.
Do I need any flags for it to work? The patch was applied on top of 
656218ab982cc22b826227045826c92743143af1

And I tried 3 runs
1) -mcpu=native -Ofast -fomit-frame-pointer -flto --param 
ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=80 
-fno-inline-functions-called-once
2) -mcpu=native -Ofast -fomit-frame-pointer -flto 
-fno-inline-functions-called-once
3) -mcpu=native -Ofast -fomit-frame-pointer -flto

First one used to give us the best result, with this patch there's no 
difference between 1 and 2 (11% regression) and the 3rd one is about 15% on top 
of that.

If there's anything I can do to help just let me know.

Cheers,
Tamar

> The patch still needs a bit of a cleanup.  The change of the default value of
> ipa-cp-unit-growth needs to be done only for small compilation units (like
> inlining does).  I should experiment if the value of
> param_ipa_cp_loop_hint_bonus should be changed or not.  And last but not
> least, I also want to clean-up the interfaces between ipa-fnsummary.c and
> ipa-cp.c a bit.  I am working on all of this and hope to finish the patch set 
> in a
> few (working) days.
> 
> The bottom line is that there is a plan to address this regression.
> 
> Martin
> 
> 
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c index e4910a04ffa..0d44310503a
> 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3190,11 +3190,23 @@ devirtualization_time_bonus (struct
> cgraph_node *node,
>  /* Return time bonus incurred because of HINTS.  */
> 
>  static int
> -hint_time_bonus (cgraph_node *node, ipa_hints hints)
> +hint_time_bonus (cgraph_node *node, ipa_hints hints, sreal
> known_iter_freq,
> +              sreal known_strides_freq)
>  {
>    int result = 0;
> -  if (hints & (INLINE_HINT_loop_iterations | INLINE_HINT_loop_stride))
> -    result += opt_for_fn (node->decl, param_ipa_cp_loop_hint_bonus);
> +  sreal bonus_for_one = opt_for_fn (node->decl,
> + param_ipa_cp_loop_hint_bonus);
> +
> +  if (hints & INLINE_HINT_loop_iterations)
> +    {
> +      /* FIXME: This should probably be way more nuanced.  */
> +      result += (known_iter_freq * bonus_for_one).to_int ();
> +    }
> +  if (hints & INLINE_HINT_loop_stride)
> +    {
> +      /* FIXME: And this as well.  */
> +      result += (known_strides_freq * bonus_for_one).to_int ();
> +    }
> +
>    return result;
>  }
> 
> @@ -3395,12 +3407,13 @@ perform_estimation_of_a_value (cgraph_node
> *node, vec<tree> known_csts,
>                              int est_move_cost, ipcp_value_base *val)  {
>    int size, time_benefit;
> -  sreal time, base_time;
> +  sreal time, base_time, known_iter_freq, known_strides_freq;
>    ipa_hints hints;
> 
>    estimate_ipcp_clone_size_and_time (node, known_csts, known_contexts,
>                                    known_aggs, &size, &time,
> -                                  &base_time, &hints);
> +                                  &base_time, &hints, &known_iter_freq,
> +                                  &known_strides_freq);
>    base_time -= time;
>    if (base_time > 65535)
>      base_time = 65535;
> @@ -3414,7 +3427,7 @@ perform_estimation_of_a_value (cgraph_node
> *node, vec<tree> known_csts,
>      time_benefit = base_time.to_int ()
>        + devirtualization_time_bonus (node, known_csts, known_contexts,
>                                    known_aggs)
> -      + hint_time_bonus (node, hints)
> +      + hint_time_bonus (node, hints, known_iter_freq,
> + known_strides_freq)
>        + removable_params_cost + est_move_cost;
> 
>    gcc_checking_assert (size >=0);
> @@ -3476,7 +3489,7 @@ estimate_local_effects (struct cgraph_node *node)
>      {
>        struct caller_statistics stats;
>        ipa_hints hints;
> -      sreal time, base_time;
> +      sreal time, base_time, known_iter_freq, known_strides_freq;
>        int size;
> 
>        init_caller_stats (&stats);
> @@ -3484,9 +3497,11 @@ estimate_local_effects (struct cgraph_node
> *node)
>                                             false);
>        estimate_ipcp_clone_size_and_time (node, known_csts,
> known_contexts,
>                                        known_aggs, &size, &time,
> -                                      &base_time, &hints);
> +                                      &base_time, &hints,
> &known_iter_freq,
> +                                      &known_strides_freq);
>        time -= devirt_bonus;
> -      time -= hint_time_bonus (node, hints);
> +      time -= hint_time_bonus (node, hints, known_iter_freq,
> +                            known_strides_freq);
>        time -= removable_params_cost;
>        size -= stats.n_calls * removable_params_cost;
> 
> diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c index
> 2cfab40156e..29fac5db19f 100644
> --- a/gcc/ipa-fnsummary.c
> +++ b/gcc/ipa-fnsummary.c
> @@ -310,6 +310,35 @@ set_hint_predicate (predicate **p, predicate
> new_predicate)
>      }
>  }
> 
> +/* Find if NEW_PREDICATE is already in V and if so, increment its freq.
> +   Otherwise add a new item to the vector with this predicate and frerq
> equal
> +   to add_freq.  */
> +
> +static void
> +add_freqcounting_predicate (vec<ipa_freqcounting_predicate, va_gc> **v,
> +                         const predicate &new_predicate, sreal add_freq) {
> +  if (new_predicate == false || new_predicate == true)
> +    return;
> +  ipa_freqcounting_predicate *f;
> +  for (int i = 0; vec_safe_iterate (*v, i, &f); i++)
> +    if (new_predicate == f->predicate)
> +      {
> +     f->freq += add_freq;
> +     return;
> +      }
> +  /* FIXME: Make this a parameter.  */
> +  if (vec_safe_length (*v) >= 32)
> +    /* Too many different predicates to account for.  */
> +    return;
> +
> +  ipa_freqcounting_predicate fcp;
> +  fcp.predicate = NULL;
> +  set_hint_predicate (&fcp.predicate, new_predicate);
> +  fcp.freq = add_freq;
> +  vec_safe_push (*v, fcp);
> +  return;
> +}
> 
>  /* Compute what conditions may or may not hold given information about
>     parameters.  RET_CLAUSE returns truths that may hold in a specialized 
> copy,
> @@ -722,10 +751,12 @@ ipa_call_summary::~ipa_call_summary ()
> 
>  ipa_fn_summary::~ipa_fn_summary ()
>  {
> -  if (loop_iterations)
> -    edge_predicate_pool.remove (loop_iterations);
> -  if (loop_stride)
> -    edge_predicate_pool.remove (loop_stride);
> +  unsigned len = vec_safe_length (loop_iterations);  for (unsigned i =
> + 0; i < len; i++)
> +    edge_predicate_pool.remove ((*loop_iterations)[i].predicate);
> +  len = vec_safe_length (loop_strides);  for (unsigned i = 0; i < len;
> + i++)
> +    edge_predicate_pool.remove ((*loop_strides)[i].predicate);
>    vec_free (conds);
>    vec_free (size_time_table);
>    vec_free (call_size_time_table);
> @@ -741,24 +772,33 @@ ipa_fn_summary_t::remove_callees (cgraph_node
> *node)
>      ipa_call_summaries->remove (e);
>  }
> 
> -/* Same as remap_predicate_after_duplication but handle hint predicate *P.
> -   Additionally care about allocating new memory slot for updated predicate
> -   and set it to NULL when it becomes true or false (and thus uninteresting).
> - */
> +/* Duplicate predicates in loop hint vector, allocating memory for them and
> +   remove and deallocate any uninteresting (true or false) ones.  Return the
> +   result.  */
> 
> -static void
> -remap_hint_predicate_after_duplication (predicate **p,
> -                                     clause_t possible_truths)
> +static vec<ipa_freqcounting_predicate, va_gc> *
> +remap_freqcounting_preds_after_dup (vec<ipa_freqcounting_predicate,
> va_gc> *v,
> +                                 clause_t possible_truths)
>  {
> -  predicate new_predicate;
> +  if (vec_safe_length (v) == 0)
> +    return NULL;
> 
> -  if (!*p)
> -    return;
> +  vec<ipa_freqcounting_predicate, va_gc> *res = v->copy ();
> +  int len = res->length();
> +  for (int i = len - 1; i >= 0; i--)
> +    {
> +      predicate new_predicate
> +     = (*res)[i].predicate->remap_after_duplication (possible_truths);
> +      /* We do not want to free previous predicate; it is used by node
> +      origin.  */
> +      (*res)[i].predicate = NULL;
> +      set_hint_predicate (&(*res)[i].predicate, new_predicate);
> +
> +      if (!(*res)[i].predicate)
> +     res->unordered_remove (i);
> +    }
> 
> -  new_predicate = (*p)->remap_after_duplication (possible_truths);
> -  /* We do not want to free previous predicate; it is used by node origin.  
> */
> -  *p = NULL;
> -  set_hint_predicate (p, new_predicate);
> +  return res;
>  }
> 
> 
> @@ -874,9 +914,11 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
>           optimized_out_size += es->call_stmt_size *
> ipa_fn_summary::size_scale;
>         edge_set_predicate (edge, &new_predicate);
>       }
> -      remap_hint_predicate_after_duplication (&info->loop_iterations,
> +      info->loop_iterations
> +     = remap_freqcounting_preds_after_dup (info->loop_iterations,
>                                             possible_truths);
> -      remap_hint_predicate_after_duplication (&info->loop_stride,
> +      info->loop_strides
> +     = remap_freqcounting_preds_after_dup (info->loop_strides,
>                                             possible_truths);
> 
>        /* If inliner or someone after inliner will ever start producing @@ 
> -888,17
> +930,21 @@ ipa_fn_summary_t::duplicate (cgraph_node *src,
>    else
>      {
>        info->size_time_table = vec_safe_copy (info->size_time_table);
> -      if (info->loop_iterations)
> +      info->loop_iterations = vec_safe_copy (info->loop_iterations);
> +      info->loop_strides = vec_safe_copy (info->loop_strides);
> +
> +      ipa_freqcounting_predicate *f;
> +      for (int i = 0; vec_safe_iterate (info->loop_iterations, i, &f);
> + i++)
>       {
> -       predicate p = *info->loop_iterations;
> -       info->loop_iterations = NULL;
> -       set_hint_predicate (&info->loop_iterations, p);
> +       predicate p = *f->predicate;
> +       f->predicate = NULL;
> +       set_hint_predicate (&f->predicate, p);
>       }
> -      if (info->loop_stride)
> +      for (int i = 0; vec_safe_iterate (info->loop_strides, i, &f);
> + i++)
>       {
> -       predicate p = *info->loop_stride;
> -       info->loop_stride = NULL;
> -       set_hint_predicate (&info->loop_stride, p);
> +       predicate p = *f->predicate;
> +       f->predicate = NULL;
> +       set_hint_predicate (&f->predicate, p);
>       }
>      }
>    if (!dst->inlined_to)
> @@ -1057,15 +1103,28 @@ ipa_dump_fn_summary (FILE *f, struct
> cgraph_node *node)
>               }
>             fprintf (f, "\n");
>           }
> -       if (s->loop_iterations)
> +       ipa_freqcounting_predicate *fcp;
> +       bool first_fcp = true;
> +       for (int i = 0; vec_safe_iterate (s->loop_iterations, i, &fcp); i++)
>           {
> -           fprintf (f, "  loop iterations:");
> -           s->loop_iterations->dump (f, s->conds);
> +           if (first_fcp)
> +             {
> +               fprintf (f, "  loop iterations:");
> +               first_fcp = false;
> +             }
> +           fprintf (f, "  %3.2f for ", fcp->freq.to_double ());
> +           fcp->predicate->dump (f, s->conds);
>           }
> -       if (s->loop_stride)
> +       first_fcp = true;
> +       for (int i = 0; vec_safe_iterate (s->loop_strides, i, &fcp); i++)
>           {
> -           fprintf (f, "  loop stride:");
> -           s->loop_stride->dump (f, s->conds);
> +           if (first_fcp)
> +             {
> +               fprintf (f, "  loop strides:");
> +               first_fcp = false;
> +             }
> +           fprintf (f, "  %3.2f for :", fcp->freq.to_double ());
> +           fcp->predicate->dump (f, s->conds);
>           }
>         fprintf (f, "  calls:\n");
>         dump_ipa_call_summary (f, 4, node, s); @@ -2514,12 +2573,13 @@
> analyze_function_body (struct cgraph_node *node, bool early)
> 
>    if (fbi.info)
>      compute_bb_predicates (&fbi, node, info, params_summary);
> +  const profile_count entry_count = ENTRY_BLOCK_PTR_FOR_FN
> + (cfun)->count;
>    order = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>    nblocks = pre_and_rev_post_order_compute (NULL, order, false);
>    for (n = 0; n < nblocks; n++)
>      {
>        bb = BASIC_BLOCK_FOR_FN (cfun, order[n]);
> -      freq = bb->count.to_sreal_scale (ENTRY_BLOCK_PTR_FOR_FN (cfun)-
> >count);
> +      freq = bb->count.to_sreal_scale (entry_count);
>        if (clobber_only_eh_bb_p (bb))
>       {
>         if (dump_file && (dump_flags & TDF_DETAILS)) @@ -2758,23
> +2818,27 @@ analyze_function_body (struct cgraph_node *node, bool early)
> 
>    if (nonconstant_names.exists () && !early)
>      {
> +      ipa_fn_summary *s = ipa_fn_summaries->get (node);
>        class loop *loop;
> -      predicate loop_iterations = true;
> -      predicate loop_stride = true;
> 
>        if (dump_file && (dump_flags & TDF_DETAILS))
>       flow_loops_dump (dump_file, NULL, 0);
>        scev_initialize ();
>        FOR_EACH_LOOP (loop, 0)
>       {
> +       predicate loop_iterations = true;
> +       sreal header_freq;
>         vec<edge> exits;
>         edge ex;
>         unsigned int j;
>         class tree_niter_desc niter_desc;
>         if (loop->header->aux)
> -         bb_predicate = *(predicate *) loop->header->aux;
> +         {
> +           bb_predicate = *(predicate *) loop->header->aux;
> +           header_freq = loop->header->count.to_sreal_scale
> (entry_count);
> +         }
>         else
> -         bb_predicate = false;
> +         continue;
> 
>         exits = get_loop_exit_edges (loop);
>         FOR_EACH_VEC_ELT (exits, j, ex)
> @@ -2790,10 +2854,10 @@ analyze_function_body (struct cgraph_node
> *node, bool early)
>               will_be_nonconstant = bb_predicate & will_be_nonconstant;
>             if (will_be_nonconstant != true
>                 && will_be_nonconstant != false)
> -             /* This is slightly inprecise.  We may want to represent each
> -                loop with independent predicate.  */
>               loop_iterations &= will_be_nonconstant;
>           }
> +       add_freqcounting_predicate (&s->loop_iterations, loop_iterations,
> +                                   header_freq);
>         exits.release ();
>       }
> 
> @@ -2803,14 +2867,20 @@ analyze_function_body (struct cgraph_node
> *node, bool early)
>        for (loop = loops_for_fn (cfun)->tree_root->inner;
>          loop != NULL; loop = loop->next)
>       {
> +       predicate loop_stride = true;
> +       sreal bb_freq;
>         basic_block *body = get_loop_body (loop);
>         for (unsigned i = 0; i < loop->num_nodes; i++)
>           {
>             gimple_stmt_iterator gsi;
>             if (body[i]->aux)
> -             bb_predicate = *(predicate *) body[i]->aux;
> +             {
> +               bb_predicate = *(predicate *) body[i]->aux;
> +               bb_freq = body[i]->count.to_sreal_scale (entry_count);
> +             }
>             else
> -             bb_predicate = false;
> +             continue;
> +
>             for (gsi = gsi_start_bb (body[i]); !gsi_end_p (gsi);
>                  gsi_next (&gsi))
>               {
> @@ -2839,16 +2909,13 @@ analyze_function_body (struct cgraph_node
> *node, bool early)
>                   will_be_nonconstant = bb_predicate &
> will_be_nonconstant;
>                 if (will_be_nonconstant != true
>                     && will_be_nonconstant != false)
> -                 /* This is slightly inprecise.  We may want to represent
> -                    each loop with independent predicate.  */
>                   loop_stride = loop_stride & will_be_nonconstant;
>               }
>           }
> +       add_freqcounting_predicate (&s->loop_strides, loop_stride,
> +                                   bb_freq);
>         free (body);
>       }
> -      ipa_fn_summary *s = ipa_fn_summaries->get (node);
> -      set_hint_predicate (&s->loop_iterations, loop_iterations);
> -      set_hint_predicate (&s->loop_stride, loop_stride);
>        scev_finalize ();
>      }
>    FOR_ALL_BB_FN (bb, my_function)
> @@ -3640,14 +3707,32 @@ ipa_call_context::estimate_size_and_time (int
> *ret_size,
>    if (time > nonspecialized_time)
>      time = nonspecialized_time;
> 
> +  m_loops_with_known_iterations = 0;
> +  ipa_freqcounting_predicate *fcp;
> +  for (i = 0; vec_safe_iterate (info->loop_iterations, i, &fcp); i++)
> +    {
> +      gcc_assert (fcp->predicate);
> +      if (!fcp->predicate->evaluate (m_possible_truths))
> +     {
> +       if (ret_hints)
> +         hints |= INLINE_HINT_loop_iterations;
> +       m_loops_with_known_iterations += fcp->freq;
> +     }
> +    }
> +  m_loops_with_known_strides = 0;
> +  for (i = 0; vec_safe_iterate (info->loop_strides, i, &fcp); i++)
> +    {
> +      gcc_assert (fcp->predicate);
> +      if (!fcp->predicate->evaluate (m_possible_truths))
> +     {
> +       if (ret_hints)
> +         hints |= INLINE_HINT_loop_stride;
> +       m_loops_with_known_strides += fcp->freq;
> +     }
> +    }
> +
>    if (ret_hints)
>      {
> -      if (info->loop_iterations
> -       && !info->loop_iterations->evaluate (m_possible_truths))
> -     hints |= INLINE_HINT_loop_iterations;
> -      if (info->loop_stride
> -       && !info->loop_stride->evaluate (m_possible_truths))
> -     hints |= INLINE_HINT_loop_stride;
>        if (info->scc_no)
>       hints |= INLINE_HINT_in_scc;
>        if (DECL_DECLARED_INLINE_P (m_node->decl)) @@ -3687,7 +3772,9 @@
> estimate_ipcp_clone_size_and_time (struct cgraph_node *node,
>                                  vec<ipa_agg_value_set> known_aggs,
>                                  int *ret_size, sreal *ret_time,
>                                  sreal *ret_nonspec_time,
> -                                ipa_hints *hints)
> +                                ipa_hints *hints,
> +                                sreal *loops_with_known_iterations,
> +                                sreal *loops_with_known_strides)
>  {
>    clause_t clause, nonspec_clause;
> 
> @@ -3699,6 +3786,8 @@ estimate_ipcp_clone_size_and_time (struct
> cgraph_node *node,
>                       known_aggs, vNULL);
>    ctx.estimate_size_and_time (ret_size, NULL, ret_time,
>                             ret_nonspec_time, hints);
> +  *loops_with_known_iterations = ctx.m_loops_with_known_iterations;
> +  *loops_with_known_strides = ctx.m_loops_with_known_strides;
>  }
> 
>  /* Return stack frame offset where frame of NODE is supposed to start
> inside @@ -3857,32 +3946,31 @@ remap_edge_summaries (struct
> cgraph_edge *inlined_edge,
>      }
>  }
> 
> -/* Same as remap_predicate, but set result into hint *HINT.  */
> +/* Run remap_after_inlining on each predicate in V.  */
> 
>  static void
> -remap_hint_predicate (class ipa_fn_summary *info,
> -                   class ipa_node_params *params_summary,
> -                   class ipa_fn_summary *callee_info,
> -                   predicate **hint,
> -                   vec<int> operand_map,
> -                   vec<int> offset_map,
> -                   clause_t possible_truths,
> -                   predicate *toplev_predicate)
> -{
> -  predicate p;
> +remap_freqcounting_predicate (class ipa_fn_summary *info,
> +                           class ipa_node_params *params_summary,
> +                           class ipa_fn_summary *callee_info,
> +                           vec<ipa_freqcounting_predicate, va_gc> *v,
> +                           vec<int> operand_map,
> +                           vec<int> offset_map,
> +                           clause_t possible_truths,
> +                           predicate *toplev_predicate)
> 
> -  if (!*hint)
> -    return;
> -  p = (*hint)->remap_after_inlining
> -                      (info, params_summary, callee_info,
> -                       operand_map, offset_map,
> -                       possible_truths, *toplev_predicate);
> -  if (p != false && p != true)
> +{
> +  ipa_freqcounting_predicate *fcp;
> +  for (int i = 0; vec_safe_iterate (v, i, &fcp); i++)
>      {
> -      if (!*hint)
> -     set_hint_predicate (hint, p);
> -      else
> -     **hint &= p;
> +      predicate p
> +     = fcp->predicate->remap_after_inlining (info, params_summary,
> +                                             callee_info, operand_map,
> +                                             offset_map, possible_truths,
> +                                             *toplev_predicate);
> +      if (p != false && p != true)
> +     /* FIXME: Is this really supposed to be &= and not a plain
> +        assignment?  */
> +     *fcp->predicate &= p;
>      }
>  }
> 
> @@ -3992,12 +4080,12 @@ ipa_merge_fn_summary_after_inlining (struct
> cgraph_edge *edge)
>    remap_edge_summaries (edge, edge->callee, info, params_summary,
>                       callee_info, operand_map,
>                       offset_map, clause, &toplev_predicate);
> -  remap_hint_predicate (info, params_summary, callee_info,
> -                     &callee_info->loop_iterations,
> -                     operand_map, offset_map, clause,
> &toplev_predicate);
> -  remap_hint_predicate (info, params_summary, callee_info,
> -                     &callee_info->loop_stride,
> -                     operand_map, offset_map, clause,
> &toplev_predicate);
> +  remap_freqcounting_predicate (info, params_summary, callee_info,
> +                             info->loop_iterations, operand_map,
> +                             offset_map, clause, &toplev_predicate);
> +  remap_freqcounting_predicate (info, params_summary, callee_info,
> +                             info->loop_strides, operand_map,
> +                             offset_map, clause, &toplev_predicate);
> 
>    HOST_WIDE_INT stack_frame_offset = ipa_get_stack_frame_offset (edge-
> >callee);
>    HOST_WIDE_INT peak = stack_frame_offset + callee_info-
> >estimated_stack_size;
> @@ -4322,12 +4410,34 @@ inline_read_section (struct lto_file_decl_data
> *file_data, const char *data,
>           info->size_time_table->quick_push (e);
>       }
> 
> -      p.stream_in (&ib);
> -      if (info)
> -        set_hint_predicate (&info->loop_iterations, p);
> -      p.stream_in (&ib);
> -      if (info)
> -        set_hint_predicate (&info->loop_stride, p);
> +      count2 = streamer_read_uhwi (&ib);
> +      for (j = 0; j < count2; j++)
> +     {
> +       p.stream_in (&ib);
> +       sreal fcp_freq = sreal::stream_in (&ib);
> +       if (info)
> +         {
> +           ipa_freqcounting_predicate fcp;
> +           fcp.predicate = NULL;
> +           set_hint_predicate (&fcp.predicate, p);
> +           fcp.freq = fcp_freq;
> +           vec_safe_push (info->loop_iterations, fcp);
> +         }
> +     }
> +      count2 = streamer_read_uhwi (&ib);
> +      for (j = 0; j < count2; j++)
> +     {
> +       p.stream_in (&ib);
> +       sreal fcp_freq = sreal::stream_in (&ib);
> +       if (info)
> +         {
> +           ipa_freqcounting_predicate fcp;
> +           fcp.predicate = NULL;
> +           set_hint_predicate (&fcp.predicate, p);
> +           fcp.freq = fcp_freq;
> +           vec_safe_push (info->loop_strides, fcp);
> +         }
> +     }
>        for (e = node->callees; e; e = e->next_callee)
>       read_ipa_call_summary (&ib, e, info != NULL);
>        for (e = node->indirect_calls; e; e = e->next_callee) @@ -4487,14
> +4597,19 @@ ipa_fn_summary_write (void)
>             e->exec_predicate.stream_out (ob);
>             e->nonconst_predicate.stream_out (ob);
>           }
> -       if (info->loop_iterations)
> -         info->loop_iterations->stream_out (ob);
> -       else
> -         streamer_write_uhwi (ob, 0);
> -       if (info->loop_stride)
> -         info->loop_stride->stream_out (ob);
> -       else
> -         streamer_write_uhwi (ob, 0);
> +       ipa_freqcounting_predicate *fcp;
> +       streamer_write_uhwi (ob, vec_safe_length (info->loop_iterations));
> +       for (i = 0; vec_safe_iterate (info->loop_iterations, i, &fcp); i++)
> +         {
> +           fcp->predicate->stream_out (ob);
> +           fcp->freq.stream_out (ob);
> +         }
> +       streamer_write_uhwi (ob, vec_safe_length (info->loop_strides));
> +       for (i = 0; vec_safe_iterate (info->loop_strides, i, &fcp); i++)
> +         {
> +           fcp->predicate->stream_out (ob);
> +           fcp->freq.stream_out (ob);
> +         }
>         for (edge = cnode->callees; edge; edge = edge->next_callee)
>           write_ipa_call_summary (ob, edge);
>         for (edge = cnode->indirect_calls; edge; edge = edge->next_callee)
> diff --git a/gcc/ipa-fnsummary.h b/gcc/ipa-fnsummary.h index
> c6ddc9f3199..d8429afdbef 100644
> --- a/gcc/ipa-fnsummary.h
> +++ b/gcc/ipa-fnsummary.h
> @@ -101,6 +101,19 @@ public:
>    }
>  };
> 
> +/* Structure to capture how frequently some interesting events occur given
> a
> +   particular predicate.  The structure is used to estimate how often we
> +   encounter loops with known iteration count or stride in various
> +   contexts.  */
> +
> +struct GTY(()) ipa_freqcounting_predicate {
> +  /* The described event happens with this frequency... */
> +  sreal freq;
> +  /* ...when this predicate evaluates to false. */
> +  class predicate * GTY((skip)) predicate; };
> +
>  /* Function inlining information.  */
>  class GTY(()) ipa_fn_summary
>  {
> @@ -112,8 +125,9 @@ public:
>        inlinable (false), single_caller (false),
>        fp_expressions (false), estimated_stack_size (false),
>        time (0), conds (NULL),
> -      size_time_table (NULL), call_size_time_table (NULL), loop_iterations
> (NULL),
> -      loop_stride (NULL), growth (0), scc_no (0)
> +      size_time_table (NULL), call_size_time_table (NULL),
> +      loop_iterations (NULL), loop_strides (NULL),
> +      growth (0), scc_no (0)
>    {
>    }
> 
> @@ -125,13 +139,12 @@ public:
>      estimated_stack_size (s.estimated_stack_size),
>      time (s.time), conds (s.conds), size_time_table (s.size_time_table),
>      call_size_time_table (NULL),
> -    loop_iterations (s.loop_iterations), loop_stride (s.loop_stride),
> +    loop_iterations (s.loop_iterations), loop_strides (s.loop_strides),
>      growth (s.growth), scc_no (s.scc_no)
>    {}
> 
>    /* Default constructor.  */
>    ~ipa_fn_summary ();
> -
>    /* Information about the function body itself.  */
> 
>    /* Minimal size increase after inlining.  */ @@ -164,12 +177,10 @@ public:
>    vec<size_time_entry, va_gc> *size_time_table;
>    vec<size_time_entry, va_gc> *call_size_time_table;
> 
> -  /* Predicate on when some loop in the function becomes to have known
> -     bounds.   */
> -  predicate * GTY((skip)) loop_iterations;
> -  /* Predicate on when some loop in the function becomes to have known
> -     stride.   */
> -  predicate * GTY((skip)) loop_stride;
> +  /* Predicates on when some loops in the function can have known
> + bounds.  */  vec<ipa_freqcounting_predicate, va_gc> *loop_iterations;
> +  /* Predicates on when some loops in the function can have known
> + strides.  */  vec<ipa_freqcounting_predicate, va_gc> *loop_strides;
>    /* Estimated growth for inlining all copies of the function before start
>       of small functions inlining.
>       This value will get out of date as the callers are duplicated, but @@ 
> -316,6
> +327,15 @@ public:
>    {
>      return m_node != NULL;
>    }
> +
> +
> +  /* How often loops will have known iterations.  Calculated in
> +     estimate_size_and_time.  */
> +  sreal m_loops_with_known_iterations;
> +  /* How often loops will have known strides.  Calculated in
> +     estimate_size_and_time. */
> +  sreal m_loops_with_known_strides;
> +
>  private:
>    /* Called function.  */
>    cgraph_node *m_node;
> @@ -353,7 +373,7 @@ void estimate_ipcp_clone_size_and_time (struct
> cgraph_node *,
>                                       vec<ipa_polymorphic_call_context>,
>                                       vec<ipa_agg_value_set>,
>                                       int *, sreal *, sreal *,
> -                                     ipa_hints *);
> +                                     ipa_hints *, sreal *, sreal *);
>  void ipa_merge_fn_summary_after_inlining (struct cgraph_edge *edge);
> void ipa_update_overall_fn_summary (struct cgraph_node *node, bool
> reset = true);  void compute_fn_summary (struct cgraph_node *, bool); diff -
> -git a/gcc/params.opt b/gcc/params.opt index f39e5d1a012..2a5f3d61727
> 100644
> --- a/gcc/params.opt
> +++ b/gcc/params.opt
> @@ -211,7 +211,7 @@ Common Joined UInteger
> Var(param_ipa_cp_single_call_penalty) Init(15) IntegerRan  Percentage
> penalty functions containing a single call to another function will receive
> when they are evaluated for cloning.
> 
>  -param=ipa-cp-unit-growth=
> -Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(10) Param
> Optimization
> +Common Joined UInteger Var(param_ipa_cp_unit_growth) Init(80) Param
> +Optimization
>  How much can given compilation unit grow because of the interprocedural
> constant propagation (in percent).
> 
>  -param=ipa-cp-value-list-size=

Reply via email to