On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <[email protected]> wrote:
>
>
> This patch contains two SUBREG-related optimization enabling tweaks to
> the x86 backend.
>
> The first change, to ix86_expand_vector_extract, cures the strange
> -march=cascadelake related non-determinism that affected my new test
> cases last week. Extracting a QImode or HImode element from an SSE
> vector performs a zero-extension to SImode, which is currently
> represented as:
>
> (set (subreg:SI (reg:QI target)) (zero_extend:SI (...))
>
> Unfortunately, the semantics of this RTL doesn't quite match what was
> intended. A set of a paradoxical subreg allows the high-bits to take
> an arbitrary value (hence the non-determinism). A more correct
> representation should be:
>
> (set (reg:SI temp) (zero_extend:SI (...))
> (set (reg:QI target) (subreg:QI (reg:SI temp))
>
> Optionally with the SUBREG rtx annotated as SUBREG_PROMOTED_VAR_P to
> indicate that value is already zero-extended in the SUBREG_REG.
>
> The second change is very similar, which is why I've included it in
> this patch, where currently the early RTL optimizers can produce:
>
> (set (reg:V?? hardreg) (subreg ...))
>
> where this instruction may require a spill/reload from memory when
> the modes aren't tieable. Alas the presence of the hard register
> prevents combine/gcse etc. optimizing this away, or reusing the result
> which would increase the lifetime of the hard register before reload.
>
> The solution is to treat vector hard registers the same way as the
> x86 backend handles scalar hard registers, and only allow sets from
> pseudos before register allocation, which is achieved by checking
> ix86_hardreg_mov_ok. Hence the above instruction is expanded and
> maintained as:
>
> (set (reg:V?? pseudo) (subreg ...))
> (set (reg:V?? hardreg) (reg:V?? pseudo))
>
> which allows the RTL optimizers freedom to optimize the SUBREG.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures. In theory, my recent "obvious"
> regexp fix to accommodate -march=cascadelake is no longer required, but
> there's no harm leaving the testsuite as it is.
>
> Ok for mainline?
>
>
> 2021-10-11 Roger Sayle <[email protected]>
>
> gcc/ChangeLog
> * config/i386/i386-expand.c (ix86_expand_vector_move): Use a
> pseudo intermediate when moving a SUBREG into a hard register,
> by checking ix86_hardreg_mov_ok.
/* Make operand1 a register if it isn't already. */
if (can_create_pseudo_p ()
- && !register_operand (op0, mode)
- && !register_operand (op1, mode))
+ && (!ix86_hardreg_mov_ok (op0, op1)
+ || (!register_operand (op0, mode)
+ && !register_operand (op1, mode))))
{
rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));
ix86_gen_scratch_sse_rtx probably returns a hard register, but here
you want a pseudo register.
> (ix86_expand_vector_extract): Store zero-extended SImode
> intermediate in a pseudo, then set target using a SUBREG_PROMOTED
> annotated subreg.
> * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating
> complex (SUBREG) sets of (vector) hard registers before reload, by
> checking ix86_hardreg_mov_ok.
>
>
> Thanks in advance,
> Roger
> --
>
--
BR,
Hongtao