> On Oct 17, 2016, at 7:21 PM, Pat Haugen <pthau...@linux.vnet.ibm.com> wrote:
> On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
>>> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html,
>>> attempted to scale down the register limit used by -fsched-pressure for the
>>> case where the block in question executes as frequently as the entry block
>>> to just the call_clobbered (i.e. call_used) regs. But the code is actually
>>> scaling toward call_saved registers. The following patch corrects that by
>>> computing call_saved regs per class and subtracting out some scaled portion
>>> of that.
>>>> Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
>> Hi Pat,
>> I stared at your patch and current code for good 30 minutes, and I still
>> don't see what is wrong with the current code.
>> With your patch the number of registers from class CL that scheduler has at
>> its disposal for a single-basic-block function will be:
>> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] -
>> where call_saved_regs_num is number of registers in class CL that need to be
>> saved in the prologue (i.e., "free" registers). I can see some logic in
>> sched_call_regs_num[CL] = call_saved_regs_num[CL];
>> but not in subtracting number of such registers from the number of total
>> available hard registers.
>> Am I missing something?
> Your original patch gave the following reasoning:
> "At the moment the scheduler does not account for spills in the prologues and
> restores in the epilogue, which occur from use of call-used registers. The
> current state is, essentially, optimized for case when there is a hot loop
> inside the function, and the loop executes significantly more often than the
> prologue/epilogue. However, on the opposite end, we have a case when the
> function is just a single non-cyclic basic block, which executes just as
> often as prologue / epilogue, so spills in the prologue hurt performance as
> much as spills in the basic block itself. In such a case the scheduler
> should throttle-down on the number of available registers and try to not go
> beyond call-clobbered registers."
> But the misunderstanding is that call-used registers do NOT cause any
> save/restore. That is to say, call-used == call-clobbered. Your last sentence
> explains the goal for a single block function, to not go beyond
> call-clobbered (i.e. call-used) registers, which makes perfect sense. My
> patch implements that goal by subtracting out call_saved_regs_num (those that
> require prolog/epilog save/restore) from the total regs, and using that as
> the target # of registers to be used for the block.
I see your point and agree that current code isn't optimal. However, I don't
think your patch is accurate either. Consider
https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume
that FIXED_REGISTERS in class CL is set for a third of the registers, and
CALL_USED_REGISTERS is set to "1" for another third of registers. So we have a
third available for zero-cost allocation (CALL_USED_REGISTERS-FIXED_REGISTERS),
a third available for spill-cost allocation (ALL_REGISTERS-CALL_USED_REGISTERS)
and a third non-available (FIXED_REGISTERS).
For a non-loop-single-basic-block function we should be targeting only the
third of register available at zero-cost -- correct? This is what is done by
the current code, but, apparently, by accident. It seems that the right
register count can be obtained with:
for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
- if (call_used_regs[ira_class_hard_regs[cl][i]])
+ if (!call_used_regs[ira_class_hard_regs[cl][i]]
+ || fixed_regs[ira_class_hard_regs[cl][i]])
Does this look correct to you?