On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > This patch contains two SUBREG-related optimization enabling tweaks to > the x86 backend. > > The first change, to ix86_expand_vector_extract, cures the strange > -march=cascadelake related non-determinism that affected my new test > cases last week. Extracting a QImode or HImode element from an SSE > vector performs a zero-extension to SImode, which is currently > represented as: > > (set (subreg:SI (reg:QI target)) (zero_extend:SI (...)) > > Unfortunately, the semantics of this RTL doesn't quite match what was > intended. A set of a paradoxical subreg allows the high-bits to take > an arbitrary value (hence the non-determinism). A more correct > representation should be: > > (set (reg:SI temp) (zero_extend:SI (...)) > (set (reg:QI target) (subreg:QI (reg:SI temp)) > > Optionally with the SUBREG rtx annotated as SUBREG_PROMOTED_VAR_P to > indicate that value is already zero-extended in the SUBREG_REG. > > The second change is very similar, which is why I've included it in > this patch, where currently the early RTL optimizers can produce: > > (set (reg:V?? hardreg) (subreg ...)) > > where this instruction may require a spill/reload from memory when > the modes aren't tieable. Alas the presence of the hard register > prevents combine/gcse etc. optimizing this away, or reusing the result > which would increase the lifetime of the hard register before reload. > > The solution is to treat vector hard registers the same way as the > x86 backend handles scalar hard registers, and only allow sets from > pseudos before register allocation, which is achieved by checking > ix86_hardreg_mov_ok. Hence the above instruction is expanded and > maintained as: > > (set (reg:V?? pseudo) (subreg ...)) > (set (reg:V?? hardreg) (reg:V?? pseudo)) > > which allows the RTL optimizers freedom to optimize the SUBREG. > > > This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap" > and "make -k check" with no new failures. In theory, my recent "obvious" > regexp fix to accommodate -march=cascadelake is no longer required, but > there's no harm leaving the testsuite as it is. > > Ok for mainline? > > > 2021-10-11 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-expand.c (ix86_expand_vector_move): Use a > pseudo intermediate when moving a SUBREG into a hard register, > by checking ix86_hardreg_mov_ok.
/* Make operand1 a register if it isn't already. */ if (can_create_pseudo_p () - && !register_operand (op0, mode) - && !register_operand (op1, mode)) + && (!ix86_hardreg_mov_ok (op0, op1) + || (!register_operand (op0, mode) + && !register_operand (op1, mode)))) { rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0)); ix86_gen_scratch_sse_rtx probably returns a hard register, but here you want a pseudo register. > (ix86_expand_vector_extract): Store zero-extended SImode > intermediate in a pseudo, then set target using a SUBREG_PROMOTED > annotated subreg. > * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating > complex (SUBREG) sets of (vector) hard registers before reload, by > checking ix86_hardreg_mov_ok. > > > Thanks in advance, > Roger > -- > -- BR, Hongtao