Wilco Dijkstra <wilco.dijks...@arm.com> writes: > Hi Andrew, > > A few comments on the implementation, I think it can be simplified a lot:
FWIW, I agree with Wilco's comments, except: >> +++ b/gcc/config/aarch64/aarch64.h >> @@ -700,8 +700,9 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE = >> AARCH64_FL_SM_OFF; >> #define DWARF2_UNWIND_INFO 1 >> >> /* Use R0 through R3 to pass exception handling information. */ >> +#define EH_RETURN_DATA_REGISTERS_N 4 >> #define EH_RETURN_DATA_REGNO(N) \ >> - ((N) < 4 ? ((unsigned int) R0_REGNUM + (N)) : INVALID_REGNUM) >> + ((N) < EH_RETURN_DATA_REGISTERS_N ? ((unsigned int) R0_REGNUM + (N)) : >> INVALID_REGNUM) > > It would be useful to add a macro IS_EH_RETURN_REGNUM(regnum) that just checks > the range R0_REGNUM to R0_REGNUM + EH_RETURN_DATA_REGISTERS_N. I've just pushed a patch that adds a global eh_return_data_regs set, so I think we can test that instead. >> @@ -929,6 +928,7 @@ struct GTY (()) aarch64_frame >> outgoing arguments) of each register save slot, or -2 if no save is >> needed. */ >> poly_int64 reg_offset[LAST_SAVED_REGNUM + 1]; >> + bool eh_return_allocated[EH_RETURN_DATA_REGISTERS_N]; > > This doesn't make much sense - besides X0-X3, we also need X5 and X6 for > eh_return. > If these or any of the other temporaries used by epilog are callee-saved > somehow, > things are going horribly wrong already... So what do we gain by doing this? > > >> +++ b/gcc/config/aarch64/aarch64.cc >> @@ -7792,6 +7792,7 @@ aarch64_layout_frame (void) >> >> #define SLOT_NOT_REQUIRED (-2) >> #define SLOT_REQUIRED (-1) >> +#define SLOT_EH_RETURN_REQUIRED (-3) > > I don't see a need for this. > > >> @@ -7949,6 +7950,18 @@ aarch64_layout_frame (void) >> stopping it from being individually shrink-wrapped. */ >> allocate_gpr_slot (R30_REGNUM); >> >> + /* Allocate the eh_return first. */ >> + if (crtl->calls_eh_return) >> + for (regno = 0; EH_RETURN_DATA_REGNO (regno) != INVALID_REGNUM; regno++) >> + { >> + int realregno = EH_RETURN_DATA_REGNO (regno); >> + if (known_eq (frame.reg_offset[realregno], SLOT_EH_RETURN_REQUIRED)) >> + { >> + frame.eh_return_allocated[regno] = true; >> + allocate_gpr_slot (realregno); >> + } >> + } > > This change is unnecessary if we just mark the slots with SLOT_REQUIRED. Also, is it necessary to allocate EH data registers first? >> @@ -8035,6 +8048,23 @@ aarch64_layout_frame (void) >> frame.wb_pop_candidate1 = frame.wb_push_candidate1; >> frame.wb_pop_candidate2 = frame.wb_push_candidate2; >> >> + /* EH data registers are not pop canidates. */ >> + if (crtl->calls_eh_return) >> + for (regno = 0; EH_RETURN_DATA_REGNO (regno) != INVALID_REGNUM; >> regno++)> >> + { >> + if (frame.eh_return_allocated[regno] >> + && frame.wb_pop_candidate1 == EH_RETURN_DATA_REGNO (regno)) >> + { >> + frame.wb_pop_candidate1 = frame.wb_pop_candidate2; >> + frame.wb_pop_candidate2 = INVALID_REGNUM; >> + } >> + if (frame.eh_return_allocated[regno] >> + && frame.wb_pop_candidate2 == EH_RETURN_DATA_REGNO (regno)) >> + { >> + frame.wb_pop_candidate2 = INVALID_REGNUM; >> + } >> + } > > This is unnecessary since we can just avoid making them push candidates > if there is no frame chain, eg: > > if ((!crtl->calls_eh_return || frame.emit_frame_chain) && !push_regs.empty () > && known_eq (frame.reg_offset[push_regs[0]], frame.bytes_below_hard_fp)) I agree we should do the check here (and similarly for the second register), rather than fixing it up later. But IMO we should test the register directly: if (!push_regs.empty () && known_eq (frame.reg_offset[push_regs[0]], frame.bytes_below_hard_fp) && (!crtl->calls_eh_return || !TEST_HARD_REG_BIT (eh_return_data_regs, push_regs[0]))) In some ways it seems unfortunate that we're generating two different copies of the epilogue in order to skip two LDPs that (with a bit of work) could easily be done before entering a combined epilogue. But we already have a branch on EH_RETURN_TAKEN_RTX as well, so maybe this is the tipping point at which duplication is worthwhile. Thanks, Richard > @@ -8681,6 +8712,20 @@ aarch64_restore_callee_saves (poly_int64 > bytes_below_sp, > if (frame.is_scs_enabled && regno == LR_REGNUM) > return true; > > + /* Skip the eh return data registers if we are > + returning normally rather than via eh_return. */ > + if (!was_eh_return && crtl->calls_eh_return) > + { > + for (unsigned ehregno = 0; > + EH_RETURN_DATA_REGNO (ehregno) != INVALID_REGNUM; > + ehregno++) > + { > + if (EH_RETURN_DATA_REGNO (ehregno) == regno > + && frame.eh_return_allocated[ehregno]) > + return true; > + } > + } > + > > So this could be something like: > > if (!was_eh_return && crtl->calls_eh_return && IS_EH_RETURN_REGNUM > (regno)) > return true; > > Cheers, > Wilco