> On Oct 18, 2016, at 1:27 PM, Maxim Kuvyrkov <[email protected]> wrote: > >> >> On Oct 17, 2016, at 7:21 PM, Pat Haugen <[email protected]> wrote: >> >> On 10/17/2016 08:17 AM, Maxim Kuvyrkov wrote: >>>> The patch here, https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01872.html, >>>> attempted to scale down the register limit used by -fsched-pressure for >>>> the case where the block in question executes as frequently as the entry >>>> block to just the call_clobbered (i.e. call_used) regs. But the code is >>>> actually scaling toward call_saved registers. The following patch corrects >>>> that by computing call_saved regs per class and subtracting out some >>>> scaled portion of that. >>>>> >>>>> Bootstrap/regtest on powerpc64le with no new failures. Ok for trunk? >>> Hi Pat, >>> >>> I stared at your patch and current code for good 30 minutes, and I still >>> don't see what is wrong with the current code. >>> >>> With your patch the number of registers from class CL that scheduler has at >>> its disposal for a single-basic-block function will be: >>> >>> sched_call_regs_num[CL] = ira_class_hard_regs_num[CL] - >>> call_saved_regs_num[CL]; >>> >>> where call_saved_regs_num is number of registers in class CL that need to >>> be saved in the prologue (i.e., "free" registers). I can see some logic in >>> setting >>> >>> sched_call_regs_num[CL] = call_saved_regs_num[CL]; >>> >>> but not in subtracting number of such registers from the number of total >>> available hard registers. >>> >>> Am I missing something? >>> >> >> Your original patch gave the following reasoning: >> >> "At the moment the scheduler does not account for spills in the prologues >> and restores in the epilogue, which occur from use of call-used registers. >> The current state is, essentially, optimized for case when there is a hot >> loop inside the function, and the loop executes significantly more often >> than the prologue/epilogue. However, on the opposite end, we have a case >> when the function is just a single non-cyclic basic block, which executes >> just as often as prologue / epilogue, so spills in the prologue hurt >> performance as much as spills in the basic block itself. In such a case the >> scheduler should throttle-down on the number of available registers and try >> to not go beyond call-clobbered registers." >> >> But the misunderstanding is that call-used registers do NOT cause any >> save/restore. That is to say, call-used == call-clobbered. Your last >> sentence explains the goal for a single block function, to not go beyond >> call-clobbered (i.e. call-used) registers, which makes perfect sense. My >> patch implements that goal by subtracting out call_saved_regs_num (those >> that require prolog/epilog save/restore) from the total regs, and using that >> as the target # of registers to be used for the block. > > I see your point and agree that current code isn't optimal. However, I don't > think your patch is accurate either. Consider > https://gcc.gnu.org/onlinedocs/gccint/Register-Basics.html and let's assume > that FIXED_REGISTERS in class CL is set for a third of the registers, and > CALL_USED_REGISTERS is set to "1" for another third of registers. So we have > a third available for zero-cost allocation > (CALL_USED_REGISTERS-FIXED_REGISTERS), a third available for spill-cost > allocation (ALL_REGISTERS-CALL_USED_REGISTERS) and a third non-available > (FIXED_REGISTERS). > > For a non-loop-single-basic-block function we should be targeting only the > third of register available at zero-cost -- correct? This is what is done by > the current code, but, apparently, by accident. It seems that the right > register count can be obtained with: > > for (int i = 0; i < ira_class_hard_regs_num[cl]; ++i) > - if (call_used_regs[ira_class_hard_regs[cl][i]]) > - ++call_used_regs_num[cl]; > + if (!call_used_regs[ira_class_hard_regs[cl][i]] > + || fixed_regs[ira_class_hard_regs[cl][i]]) > + ++call_saved_regs_num[cl]; > > Does this look correct to you?
Thinking some more, it seems like fixed_regs should not be available to the scheduler no matter what. Therefore, the number of fixed registers should be subtracted from ira_class_hard_regs_num[cl] without any scaling (entry_freq / bb_freq). -- Maxim Kuvyrkov www.linaro.org
