> Hello,
> 
> On Mon, 3 Feb 2025, H.J. Lu wrote:
> 
> > Author: Surya Kumari Jangala <jskum...@linux.ibm.com>
> > Date:   Tue Jun 25 08:37:49 2024 -0500
> > 
> >     ira: Scale save/restore costs of callee save registers with block 
> > frequency
> > 
> > scales the cost of saving/restoring a callee-save hard register in epilogue
> > and prologue with the entry block frequency, which, if not optimizing for
> > size, is 10000, for all targets.
> 
> This merely represents the fact that the entry block is indeed entered 
> exactly once per function invocation, i.e. 1.0 in fixed point with a scale 
> of 1000.  All costs in ira are (supposed to be) scaled by bb-frequency of 
> the allocno/register occurence, and hence this add_cost to cater for 
> xlogue-save/restore needs to be scaled by that as well, which is what 
> Suryas patch was adding.


This is nice summary. One correction is that REG_FREQ_MAX should
represent maximal frequency in function (i.e. innermost loop if it
exists).  So entry bock has REG_FREQ_MAX only in functions with no BBs
that execute with frequency >1.

Freq 1 is frequency of entry block (which is correctly used by the
patch).

> 
> Any fallout from that needs to be addressed on top of that, not by 
> reverting it, or by introducing a hook to avoid that.  Think of this scale 
> as an arbitrary value to implement pseudo-fixed-point arithmetic for 
> costs.  All values need to be scaled by it.  That its value is a seemingly 
> large number of 1000 is not the worry, it represents 1.0 .
> 
> If the issue is for instance that callee-saved registers aren't used 
> because the prologue save/restore is now deemed too expensive relative to 
> the around-call-save-restore when a call-clobbered register is used, then 
> either the around-call-save-restore instructions aren't correctly costed 
> (perhaps also missing the scale factor?), or because ties aren't broken 
> nicely, in which case adding a 1 at one or the other place might be 
> needed.
In the testcase I quoted from one of PRs the costs was simply
(dynamically) the same. there were two function calls with combine
frequency 1.  We may want to add logic representing that static
instruction counts matters too, if the dynamic costs are the same which
indeed can be done by adding 1.

Other thing is that push/pop is shorter than mov which is target
specific knowledge.

Honza
> 
> 
> Ciao,
> Michael.

Reply via email to