> On Oct 18, 2016, at 1:27 PM, Maxim Kuvyrkov <maxim.kuvyr...@linaro.org> wrote:
> 
>> 
>> On Oct 17, 2016, at 7:21 PM, Pat Haugen <pthau...@linux.vnet.ibm.com> wrote:
>> 
>> On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
>>>> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, 
>>>> attempted to scale down the register limit used by -fsched-pressure for 
>>>> the case where the block in question executes as frequently as the entry 
>>>> block to just the call_clobbered (i.e. call_used) regs. But the code is 
>>>> actually scaling toward call_saved registers. The following patch corrects 
>>>> that by computing call_saved regs per class and subtracting out some 
>>>> scaled portion of that.
>>>>> 
>>>>> Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
>>> Hi Pat,
>>> 
>>> I stared at your patch and current code for good 30 minutes, and I still 
>>> don't see what is wrong with the current code.
>>> 
>>> With your patch the number of registers from class CL that scheduler has at 
>>> its disposal for a single-basic-block function will be:
>>> 
>>> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - 
>>> call_saved_regs_num[CL];
>>> 
>>> where call_saved_regs_num is number of registers in class CL that need to 
>>> be saved in the prologue (i.e., "free" registers).  I can see some logic in 
>>> setting
>>> 
>>> sched_call_regs_num[CL] = call_saved_regs_num[CL];
>>> 
>>> but not in subtracting number of such registers from the number of total 
>>> available hard registers.
>>> 
>>> Am I missing something?
>>> 
>> 
>> Your original patch gave the following reasoning:
>> 
>> "At the moment the scheduler does not account for spills in the prologues 
>> and restores in the epilogue, which occur from use of call-used registers.  
>> The current state is, essentially, optimized for case when there is a hot 
>> loop inside the function, and the loop executes significantly more often 
>> than the prologue/epilogue.  However, on the opposite end, we have a case 
>> when the function is just a single non-cyclic basic block, which executes 
>> just as often as prologue / epilogue, so spills in the prologue hurt 
>> performance as much as spills in the basic block itself.  In such a case the 
>> scheduler should throttle-down on the number of available registers and try 
>> to not go beyond call-clobbered registers."
>> 
>> But the misunderstanding is that call-used registers do NOT cause any 
>> save/restore. That is to say, call-used == call-clobbered. Your last 
>> sentence explains the goal for a single block function, to not go beyond 
>> call-clobbered (i.e. call-used) registers, which makes perfect sense. My 
>> patch implements that goal by subtracting out call_saved_regs_num (those 
>> that require prolog/epilog save/restore) from the total regs, and using that 
>> as the target # of registers to be used for the block.
> 
> I see your point and agree that current code isn't optimal.  However, I don't 
> think your patch is accurate either.  Consider 
> https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume 
> that FIXED_REGISTERS in class CL is set for a third of the registers, and 
> CALL_USED_REGISTERS is set to "1" for another third of registers.  So we have 
> a third available for zero-cost allocation 
> (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost 
> allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available 
> (FIXED_REGISTERS).
> 
> For a non-loop-single-basic-block function we should be targeting only the 
> third of register available at zero-cost -- correct?  This is what is done by 
> the current code, but, apparently, by accident.  It seems that the right 
> register count can be obtained with:
> 
>         for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i)
> -         if (call_used_regs[ira_class_hard_regs[cl][i]])
> -           ++call_used_regs_num[cl];
> +         if (!call_used_regs[ira_class_hard_regs[cl][i]]
> +               || fixed_regs[ira_class_hard_regs[cl][i]])
> +           ++call_saved_regs_num[cl];
> 
> Does this look correct to you?

Thinking some more, it seems like fixed_regs should not be available to the 
scheduler no matter what.  Therefore, the number of fixed registers should be 
subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / 
bb_freq).

--
Maxim Kuvyrkov
www.linaro.org

Reply via email to