On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote:
>> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html,
>> attempted to scale down the register limit used by -fsched-pressure for the
>> case where the block in question executes as frequently as the entry block
>> to just the call_clobbered (i.e. call_used) regs. But the code is actually
>> scaling toward call_saved registers. The following patch corrects that by
>> computing call_saved regs per class and subtracting out some scaled portion
>> of that.
>> > Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk?
> Hi Pat,
> I stared at your patch and current code for good 30 minutes, and I still
> don't see what is wrong with the current code.
> With your patch the number of registers from class CL that scheduler has at
> its disposal for a single-basic-block function will be:
> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] -
> where call_saved_regs_num is number of registers in class CL that need to be
> saved in the prologue (i.e., "free" registers). I can see some logic in
> sched_call_regs_num[CL] = call_saved_regs_num[CL];
> but not in subtracting number of such registers from the number of total
> available hard registers.
> Am I missing something?
Your original patch gave the following reasoning:
"At the moment the scheduler does not account for spills in the prologues and
restores in the epilogue, which occur from use of call-used registers. The
current state is, essentially, optimized for case when there is a hot loop
inside the function, and the loop executes significantly more often than the
prologue/epilogue. However, on the opposite end, we have a case when the
function is just a single non-cyclic basic block, which executes just as often
as prologue / epilogue, so spills in the prologue hurt performance as much as
spills in the basic block itself. In such a case the scheduler should
throttle-down on the number of available registers and try to not go beyond
But the misunderstanding is that call-used registers do NOT cause any
save/restore. That is to say, call-used == call-clobbered. Your last sentence
explains the goal for a single block function, to not go beyond call-clobbered
(i.e. call-used) registers, which makes perfect sense. My patch implements that
goal by subtracting out call_saved_regs_num (those that require prolog/epilog
save/restore) from the total regs, and using that as the target # of registers
to be used for the block.
> Also, could you share the testcase that you used to investigate the problem
> with register-aware scheduling? I wonder if there is a problem lurking.
I don't have a testcase. I'm currently trying to get -fsched-pressure to be
beneficial for PowerPC and was familiarizing myself with the code when I
spotted the issue.