On Thu, May 13, 2021 at 11:43:19AM +0200, Uros Bizjak wrote: > > > Bootstrapped and regtested on X86_64-linux-gnu{-m32,} > > > Ok for trunk? > > > > Some time ago a support for CLOBBER_HIGH RTX was added (and later > > removed for some reason). Perhaps we could resurrect the patch for the > > purpose of ferrying 128bit modes via vzeroupper RTX? > > https://gcc.gnu.org/legacy-ml/gcc-patches/2017-11/msg01325.html
https://gcc.gnu.org/legacy-ml/gcc-patches/2019-09/msg01468.html is where it got removed, CCing Richard. > > +(define_split > > + [(match_parallel 0 "vzeroupper_pattern" > > + [(unspec_volatile [(const_int 0)] UNSPECV_VZEROUPPER)])] > > + "TARGET_AVX && ix86_pre_reload_split ()" > > + [(match_dup 0)] > > +{ > > + /* When vzeroupper is explictly used, for LRA purpose, make it clear > > + the instruction kills sse registers. */ > > + gcc_assert (cfun->machine->has_explicit_vzeroupper); > > + unsigned int nregs = TARGET_64BIT ? 16 : 8; > > + rtvec vec = rtvec_alloc (nregs + 1); > > + RTVEC_ELT (vec, 0) = gen_rtx_UNSPEC_VOLATILE (VOIDmode, > > + gen_rtvec (1, const1_rtx), > > + UNSPECV_VZEROUPPER); > > + for (unsigned int i = 0; i < nregs; ++i) > > + { > > + unsigned int regno = GET_SSE_REGNO (i); > > + rtx reg = gen_rtx_REG (V2DImode, regno); > > + RTVEC_ELT (vec, i + 1) = gen_rtx_CLOBBER (VOIDmode, reg); > > + } > > + operands[0] = gen_rtx_PARALLEL (VOIDmode, vec); > > +}) > > > > Wouldn't this also kill lower 128bit values that are not touched by > > vzeroupper? A CLOBBER_HIGH would be more appropriate here. Yes, it would. But normally the only xmm* hard regs live across the explicit user vzeroupper would be local and global register variables, I think the 1st scheduler etc. shouldn't extend lifetime of the xmm hard regs across UNSPEC_VOLATILE. Jakub