On Mon, May 17, 2021 at 5:56 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > Hongtao Liu via Gcc-patches <gcc-patches@gcc.gnu.org> writes: > > On Fri, May 14, 2021 at 10:27 AM Hongtao Liu <crazy...@gmail.com> wrote: > >> > >> On Thu, May 13, 2021 at 7:52 PM Richard Sandiford > >> <richard.sandif...@arm.com> wrote: > >> > > >> > Jakub Jelinek <ja...@redhat.com> writes: > >> > > On Thu, May 13, 2021 at 12:32:26PM +0100, Richard Sandiford wrote: > >> > >> Jakub Jelinek <ja...@redhat.com> writes: > >> > >> > On Thu, May 13, 2021 at 11:43:19AM +0200, Uros Bizjak wrote: > >> > >> >> > > Bootstrapped and regtested on X86_64-linux-gnu{-m32,} > >> > >> >> > > Ok for trunk? > >> > >> >> > > >> > >> >> > Some time ago a support for CLOBBER_HIGH RTX was added (and later > >> > >> >> > removed for some reason). Perhaps we could resurrect the patch > >> > >> >> > for the > >> > >> >> > purpose of ferrying 128bit modes via vzeroupper RTX? > >> > >> >> > >> > >> >> https://gcc.gnu.org/legacy-ml/gcc-patches/2017-11/msg01325.html > >> > >> > > >> > >> > https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01468.html > >> > >> > is where it got removed, CCing Richard. > >> > >> > >> > >> Yeah. Initially clobber_high seemed like the best appraoch for > >> > >> handling the tlsdesc thing, but in practice it was too difficult > >> > >> to shoe-horn the concept in after the fact, when so much rtl > >> > >> infrastructure wasn't prepared to deal with it. The old support > >> > >> didn't handle all cases and passes correctly, and handled others > >> > >> suboptimally. > >> > >> > >> > >> I think it would be worth using the same approach as > >> > >> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01466.html for > >> > >> vzeroupper: represent the instructions as call_insns in which the > >> > >> call has a special vzeroupper ABI. I think that's likely to lead > >> > >> to better code than clobber_high would (or at least, it did for > >> > >> tlsdesc). > >> > >> From an implementation perspective, I guess you're meaning we should > >> implement TARGET_INSN_CALLEE_ABI and TARGET_FNTYPE_ABI in the i386 > >> backend. > >> > > When I implemented the vzeroupper pattern as call_insn and defined > > TARGET_INSN_CALLEE_ABI for it, I got several failures. they're related > > to 2 parts > > > > 1. requires_stack_frame_p return true for vzeroupper which should be false. > > 2. in subst_stack_regs, vzeroupper shouldn't kill arguments > > > > I've tried a rough patch like below, it works for those failures, > > unfortunately, I don't have an arm machine to test, so I want to ask > > would the below change break something in the arm backend? > > ABI id 0 just means the default ABI. Real calls can use other ABIs > besides the default. That said… > > > modified gcc/reg-stack.c > > @@ -174,6 +174,7 @@ > > #include "reload.h" > > #include "tree-pass.h" > > #include "rtl-iter.h" > > +#include "function-abi.h" > > > > #ifdef STACK_REGS > > > > @@ -2385,7 +2386,7 @@ subst_stack_regs (rtx_insn *insn, stack_ptr regstack) > > bool control_flow_insn_deleted = false; > > int i; > > > > - if (CALL_P (insn)) > > + if (CALL_P (insn) && insn_callee_abi (insn).id () == 0) > > { > > int top = regstack->top; > > …reg-stack.c is effectively x86-specific code, so checking id 0 here > wouldn't affect anything else. It doesn't feel very future-proof > though, since x86 could use ABIs other than 0 for real calls in future. > > AIUI the property that matters here isn't the ABI, but that the target > of the call doesn't reference stack registers. That can be true for > real calls too, with -fipa-ra. > > > modified gcc/shrink-wrap.c > > @@ -58,7 +58,12 @@ requires_stack_frame_p (rtx_insn *insn, > > HARD_REG_SET prologue_used, > > unsigned regno; > > > > if (CALL_P (insn)) > > - return !SIBLING_CALL_P (insn); > > + { > > + if (insn_callee_abi (insn).id() != 0) > > + return false; > > + else > > + return !SIBLING_CALL_P (insn); > > + } > > TBH I'm not sure why off-hand this function needs to treat non-sibling > calls specially, rather than rely on normal DF information. Calls have > a use of the stack pointer, so we should return true for that reason: > > /* The stack ptr is used (honorarily) by a CALL insn. */ > df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i], > NULL, bb, insn_info, DF_REF_REG_USE, > DF_REF_CALL_STACK_USAGE | flags); > > I guess this is something we should suppress for fake calls though. > > It looks like the rtx “used” flag is unused for INSNs, so we could > use that as a CALL_INSN flag that indicates a fake call. We could just > need to make: > > /* For all other RTXes clear the used flag on the copy. */ > RTX_FLAG (copy, used) = 0; > > conditional on !INSN_P. > I got another error in
@@ -83,6 +83,9 @@ control_flow_insn_p (const rtx_insn *insn) return true; case CALL_INSN: + /* CALL_INSN use "used" flag to indicate it's a fake call. */ + if (RTX_FLAG (insn, used)) + break; and performance issue in modified gcc/final.c @@ -4498,7 +4498,8 @@ leaf_function_p (void) for (insn = get_insns (); insn; insn = NEXT_INSN (insn)) { if (CALL_P (insn) - && ! SIBLING_CALL_P (insn)) + && ! SIBLING_CALL_P (insn) + && !RTX_FLAG (insn, used)) return 0; if (NONJUMP_INSN_P (insn) Also i grep CALL_P or CALL_INSN in GCC source codes, there are many places which hold the assumption CALL_P/CALL_INSN is a real call. Considering that vzeroupper is used a lot on the i386 backend, I'm a bit worried that this implementation solution will be a bottomless pit. > Thanks, > Richard -- BR, Hongtao