> Recursive call graph edges, even when they are hot and important for > the compiled program, can never have frequency bigger than one, even > when the actual time savings in the next recursion call are not > realized just once but depend on the depth of recursion. The current > IPA-CP effect propagation code did not take that into account and just > used the frequency, thus severely underestimating the effect. > > This patch artificially boosts values taking part in such calls. If a > value feeds into itself through a recursive call, the frequency of the > edge is multiplied by a parameter with default value of 6, basically > assuming that the recursion will take place 6 times. This value can > of course be subject to change. > > Moreover, values which do not feed into themselves but which were > generated for a self-recursive call with an arithmetic > pass-function (aka the 548.exchange "hack" which however is generally > applicable for recursive functions which count the recursion depth in > a parameter) have the edge frequency multiplied as many times as there > are generated values in the chain. In essence, we will assume they > are all useful. > > This patch partially fixes the current situation when we fail to > optimize 548.exchange with PGO. In the benchmark one recursive edge > count overwhelmingly dominates all other counts in the program and so > we fail to perform the first cloning (for the nonrecursive entry call) > because it looks totally insignificant. > > gcc/ChangeLog: > > 2021-07-16 Martin Jambor <mjam...@suse.cz> > > * params.opt (ipa-cp-recursive-freq-factor): New. > * ipa-cp.c (ipcp_value): Switch to inline initialization. New members > scc_no, self_recursion_generated_level, same_scc and > self_recursion_generated_p. > (ipcp_lattice::add_value): Replaced parameter unlimited with > same_lat_gen_level, usit it determine limit of values and store it to > the value. > (ipcp_lattice<valtype>::print): Dump the new fileds. > (allocate_and_init_ipcp_value): Take same_lat_gen_level as a new > parameter and store it to the new value. > (self_recursively_generated_p): Removed. > (propagate_vals_across_arith_jfunc): Use self_recursion_generated_p > instead of self_recursively_generated_p, store self generation level > to such values. > (value_topo_info<valtype>::add_val): Set scc_no. > (value_topo_info<valtype>::propagate_effects): Multiply frequencies of > recursively feeding values and self generated values by appropriate > new factors.
If you boost every self fed value by factor of 6, I wonder how quickly we run into exponential explosion of the cost (since the frequency should be close to 1 and 6^9=10077696.... I think it would be more robust to simply assume that the job will distribute evenly across the clones. How hard is to implement that? Honza > --- > gcc/ipa-cp.c | 161 ++++++++++++++++++++++++------------------------- > gcc/params.opt | 4 ++ > 2 files changed, 84 insertions(+), 81 deletions(-) > > diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c > index 55b9216337f..b987d975793 100644 > --- a/gcc/ipa-cp.c > +++ b/gcc/ipa-cp.c > @@ -184,30 +184,52 @@ public: > /* The actual value for the given parameter. */ > valtype value; > /* The list of sources from which this value originates. */ > - ipcp_value_source <valtype> *sources; > + ipcp_value_source <valtype> *sources = nullptr; > /* Next pointers in a linked list of all values in a lattice. */ > - ipcp_value *next; > + ipcp_value *next = nullptr; > /* Next pointers in a linked list of values in a strongly connected > component > of values. */ > - ipcp_value *scc_next; > + ipcp_value *scc_next = nullptr; > /* Next pointers in a linked list of SCCs of values sorted topologically > according their sources. */ > - ipcp_value *topo_next; > + ipcp_value *topo_next = nullptr; > /* A specialized node created for this value, NULL if none has been (so > far) > created. */ > - cgraph_node *spec_node; > + cgraph_node *spec_node = nullptr; > /* Depth first search number and low link for topological sorting of > values. */ > - int dfs, low_link; > + int dfs = 0; > + int low_link = 0; > + /* SCC number to identify values which recursively feed into each other. > + Values in the same SCC have the same SCC number. */ > + int scc_no = 0; > + /* Non zero if the value is generated from another value in the same > lattice > + for a self-recursive call, the actual number is how many times the > + operation has been performed. In the unlikely event of the value being > + present in two chains fo self-recursive value generation chains, it is > the > + maximum. */ > + unsigned self_recursion_generated_level = 0; > /* True if this value is currently on the topo-sort stack. */ > - bool on_stack; > - > - ipcp_value() > - : sources (0), next (0), scc_next (0), topo_next (0), > - spec_node (0), dfs (0), low_link (0), on_stack (false) {} > + bool on_stack = false; > > void add_source (cgraph_edge *cs, ipcp_value *src_val, int src_idx, > HOST_WIDE_INT offset); > + > + /* Return true if both THIS value and O feed into each other. */ > + > + bool same_scc (const ipcp_value<valtype> *o) > + { > + return o->scc_no == scc_no; > + } > + > +/* Return true, if a this value has been generated for a self-recursive call > as > + a result of an arithmetic pass-through jump-function acting on a value in > + the same lattice function. */ > + > + bool self_recursion_generated_p () > + { > + return self_recursion_generated_level > 0; > + } > }; > > /* Lattice describing potential values of a formal parameter of a function, > or > @@ -239,7 +261,7 @@ public: > ipcp_value<valtype> *src_val = NULL, > int src_idx = 0, HOST_WIDE_INT offset = -1, > ipcp_value<valtype> **val_p = NULL, > - bool unlimited = false); > + unsigned same_lat_gen_level = 0); > void print (FILE * f, bool dump_sources, bool dump_benefits); > }; > > @@ -498,7 +520,11 @@ ipcp_lattice<valtype>::print (FILE * f, bool > dump_sources, bool dump_benefits) > { > ipcp_value_source<valtype> *s; > > - fprintf (f, " [from:"); > + if (val->self_recursion_generated_p ()) > + fprintf (f, " [self_gen(%i), from:", > + val->self_recursion_generated_level); > + else > + fprintf (f, " [scc: %i, from:", val->scc_no); > for (s = val->sources; s; s = s->next) > fprintf (f, " %i(%f)", s->cs->caller->order, > s->cs->sreal_frequency ().to_double ()); > @@ -1837,12 +1863,13 @@ ipcp_value<valtype>::add_source (cgraph_edge *cs, > ipcp_value *src_val, > SOURCE and clear all other fields. */ > > static ipcp_value<tree> * > -allocate_and_init_ipcp_value (tree source) > +allocate_and_init_ipcp_value (tree cst, unsigned same_lat_gen_level) > { > ipcp_value<tree> *val; > > val = new (ipcp_cst_values_pool.allocate ()) ipcp_value<tree>(); > - val->value = source; > + val->value = cst; > + val->self_recursion_generated_level = same_lat_gen_level; > return val; > } > > @@ -1850,14 +1877,15 @@ allocate_and_init_ipcp_value (tree source) > value to SOURCE and clear all other fields. */ > > static ipcp_value<ipa_polymorphic_call_context> * > -allocate_and_init_ipcp_value (ipa_polymorphic_call_context source) > +allocate_and_init_ipcp_value (ipa_polymorphic_call_context ctx, > + unsigned same_lat_gen_level) > { > ipcp_value<ipa_polymorphic_call_context> *val; > > - // TODO > val = new (ipcp_poly_ctx_values_pool.allocate ()) > ipcp_value<ipa_polymorphic_call_context>(); > - val->value = source; > + val->value = ctx; > + val->self_recursion_generated_level = same_lat_gen_level; > return val; > } > > @@ -1865,8 +1893,12 @@ allocate_and_init_ipcp_value > (ipa_polymorphic_call_context source) > SRC_VAL SRC_INDEX and OFFSET are meant for add_source and have the same > meaning. OFFSET -1 means the source is scalar and not a part of an > aggregate. If non-NULL, VAL_P records address of existing or newly added > - ipcp_value. UNLIMITED means whether value count should not exceed the > limit > - given by PARAM_IPA_CP_VALUE_LIST_SIZE. */ > + ipcp_value. > + > + If the value is generated for a self-recursive call as a result of an > + arithmetic pass-through jump-function acting on a value in the same > lattice, > + SAME_LAT_GEN_LEVEL must be the length of such chain, otherwise it must be > + zero. If it is non-zero, PARAM_IPA_CP_VALUE_LIST_SIZE limit is ignored. > */ > > template <typename valtype> > bool > @@ -1874,7 +1906,7 @@ ipcp_lattice<valtype>::add_value (valtype newval, > cgraph_edge *cs, > ipcp_value<valtype> *src_val, > int src_idx, HOST_WIDE_INT offset, > ipcp_value<valtype> **val_p, > - bool unlimited) > + unsigned same_lat_gen_level) > { > ipcp_value<valtype> *val, *last_val = NULL; > > @@ -1890,6 +1922,9 @@ ipcp_lattice<valtype>::add_value (valtype newval, > cgraph_edge *cs, > if (val_p) > *val_p = val; > > + if (val->self_recursion_generated_level < same_lat_gen_level) > + val->self_recursion_generated_level = same_lat_gen_level; > + > if (ipa_edge_within_scc (cs)) > { > ipcp_value_source<valtype> *s; > @@ -1904,7 +1939,7 @@ ipcp_lattice<valtype>::add_value (valtype newval, > cgraph_edge *cs, > return false; > } > > - if (!unlimited && values_count == opt_for_fn (cs->caller->decl, > + if (!same_lat_gen_level && values_count == opt_for_fn (cs->caller->decl, > param_ipa_cp_value_list_size)) > { > /* We can only free sources, not the values themselves, because sources > @@ -1923,7 +1958,7 @@ ipcp_lattice<valtype>::add_value (valtype newval, > cgraph_edge *cs, > } > > values_count++; > - val = allocate_and_init_ipcp_value (newval); > + val = allocate_and_init_ipcp_value (newval, same_lat_gen_level); > val->add_source (cs, src_val, src_idx, offset); > val->next = NULL; > > @@ -1940,60 +1975,6 @@ ipcp_lattice<valtype>::add_value (valtype newval, > cgraph_edge *cs, > return true; > } > > -/* Return true, if a ipcp_value VAL is orginated from parameter value of > - self-feeding recursive function via some kind of pass-through jump > - function. */ > - > -static bool > -self_recursively_generated_p (ipcp_value<tree> *val) > -{ > - class ipa_node_params *info = NULL; > - > - for (ipcp_value_source<tree> *src = val->sources; src; src = src->next) > - { > - cgraph_edge *cs = src->cs; > - > - if (!src->val || cs->caller != cs->callee->function_symbol ()) > - return false; > - > - if (src->val == val) > - continue; > - > - if (!info) > - info = ipa_node_params_sum->get (cs->caller); > - > - class ipcp_param_lattices *plats = ipa_get_parm_lattices (info, > - src->index); > - ipcp_lattice<tree> *src_lat; > - ipcp_value<tree> *src_val; > - > - if (src->offset == -1) > - src_lat = &plats->itself; > - else > - { > - struct ipcp_agg_lattice *src_aglat; > - > - for (src_aglat = plats->aggs; src_aglat; src_aglat = src_aglat->next) > - if (src_aglat->offset == src->offset) > - break; > - > - if (!src_aglat) > - return false; > - > - src_lat = src_aglat; > - } > - > - for (src_val = src_lat->values; src_val; src_val = src_val->next) > - if (src_val == val) > - break; > - > - if (!src_val) > - return false; > - } > - > - return true; > -} > - > /* A helper function that returns result of operation specified by OPCODE on > the value of SRC_VAL. If non-NULL, OPND1_TYPE is expected type for the > value of SRC_VAL. If the operation is binary, OPND2 is a constant value > @@ -2068,7 +2049,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs, > source, this is absolutely conservative, but could avoid explosion > of lattice's value space, especially when one recursive function > calls another recursive. */ > - if (self_recursively_generated_p (src_val)) > + if (src_val->self_recursion_generated_p ()) > { > ipcp_value_source<tree> *s; > > @@ -2096,7 +2077,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs, > break; > > ret |= dest_lat->add_value (cstval, cs, src_val, src_idx, > - src_offset, &src_val, true); > + src_offset, &src_val, j); > gcc_checking_assert (src_val); > } > } > @@ -2108,7 +2089,7 @@ propagate_vals_across_arith_jfunc (cgraph_edge *cs, > /* Now we do not use self-recursively generated value as propagation > source, otherwise it is easy to make value space of normal lattice > overflow. */ > - if (self_recursively_generated_p (src_val)) > + if (src_val->self_recursion_generated_p ()) > { > ret |= dest_lat->set_contains_variable (); > continue; > @@ -3732,6 +3713,7 @@ value_topo_info<valtype>::add_val (ipcp_value<valtype> > *cur_val) > v = stack; > stack = v->topo_next; > v->on_stack = false; > + v->scc_no = cur_val->dfs; > > v->scc_next = scc_list; > scc_list = v; > @@ -3905,8 +3887,25 @@ value_topo_info<valtype>::propagate_effects () > else > continue; > } > + > + int special_factor = 1; > + if (val->same_scc (src->val)) > + special_factor > + = opt_for_fn(src->cs->caller->decl, > + param_ipa_cp_recursive_freq_factor); > + else if (val->self_recursion_generated_p () > + && (src->cs->callee->function_symbol () > + == src->cs->caller)) > + { > + int max_recur_gen_depth > + = opt_for_fn(src->cs->caller->decl, > + param_ipa_cp_max_recursive_depth); > + special_factor = max_recur_gen_depth > + - val->self_recursion_generated_level + 1; > + } > + > src->val->prop_time_benefit > - += time * src->cs->sreal_frequency (); > + += time * special_factor * src->cs->sreal_frequency (); > } > > if (size < INT_MAX) > diff --git a/gcc/params.opt b/gcc/params.opt > index 92b003e38cb..8d772309407 100644 > --- a/gcc/params.opt > +++ b/gcc/params.opt > @@ -266,6 +266,10 @@ Maximum depth of recursive cloning for self-recursive > function. > Common Joined UInteger Var(param_ipa_cp_min_recursive_probability) Init(2) > Param Optimization > Recursive cloning only when the probability of call being executed exceeds > the parameter. > > +-param=ipa-cp-recursive-freq-factor= > +Common Joined UInteger Var(param_ipa_cp_recursive_freq_factor) Init(6) Param > Optimization > +When propagating IPA-CP effect estimates, multiply frequencies of recursive > edges that that bring back an unchanged value by this factor. > + > -param=ipa-cp-recursion-penalty= > Common Joined UInteger Var(param_ipa_cp_recursion_penalty) Init(40) > IntegerRange(0, 100) Param Optimization > Percentage penalty the recursive functions will receive when they are > evaluated for cloning. > -- > 2.32.0 >