Vineet Gupta <vine...@rivosinc.com> writes:
> On 8/6/24 17:36, Vineet Gupta wrote:
>> I'm currently pursuing a different trail which comes form observation
>> that initial model setup concludes that pressure is 28 so with 27
>> allocable regs we are bound to spill one.
>> More on that after I find something concrete.
>
> (caveat: I enabled -fomit-frame-pointer for both risc-v and aarch64)
>
> Observation:  So on risc-v, sched1's very first pressure dump starts off with 
> initial pressure 1
>
> ;;   ======================================================
> ;;   -- basic block 2 from 6 to 174 -- before reload
> ;;   ======================================================
> ;;    | idx insn | mpri hght dpth prio |          # model_record_pressures ()
> ;;    |   0    6 |    0    3    0    5 | r154=high(`j')  GR_REGS:[1,+1]
>                                                                  ^^^
>
> While on aarch64 is starts off with 0.
>
> ;;    |   0    6 |    0    3    0    6 | r122=high(`j') GENERAL_REGS:[0,+1] 
> FP_REGS:[0,+0] PR_LO_REGS:[0,+0] PR_HI_REGS:[0,+0]
>                                                                      ^^^
>
> This seems to be happening because of HARD_FP (reno 8)
>
> model_start_schedule ()
>    initiate_reg_pressure_info (df_get_live_in (bb))
>        EXECUTE_IF_SET_IN_BITMAP (live, 0, j, bi)
>            mark_regno_birth_or_death (.. j )
>               if ( ! TEST_HARD_REG_BIT (ira_no_alloc_regs, regno)))
>                   bitmap_set_bit (live, regno)
>
> For RISC-V, the loop above executes for regno 2 (SP), 8 (HARD_FP), 64 (FP), 
> 65 (Arg).
>
> The DF infra (before reload) sets up artificial usage for HARD_FP : see 
> df_get_regular_block_artificial_uses () hence it shows up in df_get_live_in 
> (bb)
>
> On RISC-V, FIXED_REGISTERS omits FP and consequently ira_no_alloc_regs 
> doesn't include HARD_FP. This seems sensible (at least intuitive) since reg
> allocator is allowed to use HARD_FP (which due to -fomit-frame-pointer 
> becomes first callee reg S0).
>
> (gdb) p/x this_target_ira_int->x_no_unit_alloc_regs
> $1 = {elts = {0x1f, 0xffffffffffffffff}}    <-- bit 8 for HARD_FP not set
>
> On aarch64, HARD_FP regno 29 is marked as FIXED_REGISTERS thus is present in 
> ira_no_alloc_regs
>
> (gdb) p/x this_target_ira_int->x_no_unit_alloc_regs
> $1 = {elts = {0xa0000000, 0x0}}
>
> So I don't understand 2 things:
>
> 1. Why is aarch64 reserving away HARD_FP (at least from ira) when clearly 
> user is saying -fomit-frame-pointer (It seems this remains even if disable
> exception, asynch unwind etc)

That decision was made before my time, but I think it's because,
per the ABI, code is allowed to assume that r29 points to a valid
frame chain record at all times, and that following the chain up
the stack will not crash (assuming an uncorrupted stack, of course).
On AArch64, -fomit-frame-pointer says that it's ok to skip setting up
x29 (and the frame chain) for a function that doesn't inherently need
a frame pointer, but it doesn't mean that we can break backtracing for
functions further up the stack.

> 2. On RISC-V sched1 is counter intuitively assuming HARD_FP is live due to 
> the weird interaction of DF infra (which always marks HARD_FP with
> artificial def) and ira_no_alloc_regs.

In general, it isn't possible to predict at this stage whether the hard
frame pointer will be needed, even for -fomit-frame-pointer.  The final
decision is made during LRA, which in the worst case has iterate through
several elimination attempts.

So whatever we do here will be wrong for some cases.  In some ways,
assuming that the hard frame pointer will be needed is the conservative
option; if we instead assumed that it wasn't needed, we'd be more
willing to move code around to make use of that (supposed) extra register.

Thanks,
Richard

Reply via email to