Hi,

On Wed, Oct 06 2021, Jan Hubicka wrote:
>> Recursive call graph edges, even when they are hot and important for
>> the compiled program, can never have frequency bigger than one, even
>> when the actual time savings in the next recursion call are not
>> realized just once but depend on the depth of recursion.  The current
>> IPA-CP effect propagation code did not take that into account and just
>> used the frequency, thus severely underestimating the effect.
>> 
>> This patch artificially boosts values taking part in such calls.  If a
>> value feeds into itself through a recursive call, the frequency of the
>> edge is multiplied by a parameter with default value of 6, basically
>> assuming that the recursion will take place 6 times.  This value can
>> of course be subject to change.
>> 
>> Moreover, values which do not feed into themselves but which were
>> generated for a self-recursive call with an arithmetic
>> pass-function (aka the 548.exchange "hack" which however is generally
>> applicable for recursive functions which count the recursion depth in
>> a parameter) have the edge frequency multiplied as many times as there
>> are generated values in the chain.  In essence, we will assume they
>> are all useful.
>> 
>> This patch partially fixes the current situation when we fail to
>> optimize 548.exchange with PGO.  In the benchmark one recursive edge
>> count overwhelmingly dominates all other counts in the program and so
>> we fail to perform the first cloning (for the nonrecursive entry call)
>> because it looks totally insignificant.
>> 
>> gcc/ChangeLog:
>> 
>> 2021-07-16  Martin Jambor  <mjam...@suse.cz>
>> 
>>      * params.opt (ipa-cp-recursive-freq-factor): New.
>>      * ipa-cp.c (ipcp_value): Switch to inline initialization.  New members
>>      scc_no, self_recursion_generated_level, same_scc and
>>      self_recursion_generated_p.
>>      (ipcp_lattice::add_value): Replaced parameter unlimited with
>>      same_lat_gen_level, usit it determine limit of values and store it to
>>      the value.
>>      (ipcp_lattice<valtype>::print): Dump the new fileds.
>>      (allocate_and_init_ipcp_value): Take same_lat_gen_level as a new
>>      parameter and store it to the new value.
>>      (self_recursively_generated_p): Removed.
>>      (propagate_vals_across_arith_jfunc): Use self_recursion_generated_p
>>      instead of self_recursively_generated_p, store self generation level
>>      to such values.
>>      (value_topo_info<valtype>::add_val): Set scc_no.
>>      (value_topo_info<valtype>::propagate_effects): Multiply frequencies of
>>      recursively feeding values and self generated values by appropriate
>>      new factors.
>
> If you boost every self fed value by factor of 6, I wonder how quickly
> we run into exponential explosion of the cost (since the frequency
> should be close to 1 and 6^9=10077696....

The factor of six is applied once for an entire SCC, so we'd reach this
huge number only if there was a chain of nine different recursive
functions - with this patch we assume each one will recurse six times,
so the result is indeed a huge execution count estimate.

The constant is not used for the "self generated" values like those in
exchange, those are handled by the else branch below.  For those we
expect the recursion happens 8 times, because that is how many values we
generate, but the boost is different depending on the recursion depth.

>
> I think it would be more robust to simply assume that the job will
>distribute evenly across the clones.  How hard is to implement that?

This is not an update of counters.  The code tries to estimate execution
time improvement that is will be possible in callees if we clone for
this particular value and so is based on call graph edge frequencies (so
that if in a callee we can save 5 units of time and the frequency is 5,
we estimate we will save 25).  The code has the advantage that it is
universal for both situations when profile feedback is and is not
available.

Martin

Reply via email to