On Mon, Oct 11, 2021 at 4:55 PM Roger Sayle <ro...@nextmovesoftware.com> wrote:
>
>
> This patch contains two SUBREG-related optimization enabling tweaks to
> the x86 backend.
>
> The first change, to ix86_expand_vector_extract, cures the strange
> -march=cascadelake related non-determinism that affected my new test
> cases last week.  Extracting a QImode or HImode element from an SSE
> vector performs a zero-extension to SImode, which is currently
> represented as:
>
> (set (subreg:SI (reg:QI target)) (zero_extend:SI (...))
>
> Unfortunately, the semantics of this RTL doesn't quite match what was
> intended.  A set of a paradoxical subreg allows the high-bits to take
> an arbitrary value (hence the non-determinism).  A more correct
> representation should be:
>
> (set (reg:SI temp) (zero_extend:SI (...))
> (set (reg:QI target) (subreg:QI (reg:SI temp))
>
> Optionally with the SUBREG rtx annotated as SUBREG_PROMOTED_VAR_P to
> indicate that value is already zero-extended in the SUBREG_REG.
>
> The second change is very similar, which is why I've included it in
> this patch, where currently the early RTL optimizers can produce:
>
> (set (reg:V?? hardreg) (subreg ...))
>
> where this instruction may require a spill/reload from memory when
> the modes aren't tieable.  Alas the presence of the hard register
> prevents combine/gcse etc. optimizing this away, or reusing the result
> which would increase the lifetime of the hard register before reload.
>
> The solution is to treat vector hard registers the same way as the
> x86 backend handles scalar hard registers, and only allow sets from
> pseudos before register allocation, which is achieved by checking
> ix86_hardreg_mov_ok.  Hence the above instruction is expanded and
> maintained as:
>
> (set (reg:V?? pseudo) (subreg ...))
> (set (reg:V?? hardreg) (reg:V?? pseudo))
>
> which allows the RTL optimizers freedom to optimize the SUBREG.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
> and "make -k check" with no new failures.  In theory, my recent "obvious"
> regexp fix to accommodate -march=cascadelake is no longer required, but
> there's no harm leaving the testsuite as it is.
>
> Ok for mainline?
>
>
> 2021-10-11  Roger Sayle  <ro...@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-expand.c (ix86_expand_vector_move):  Use a
>         pseudo intermediate when moving a SUBREG into a hard register,
>         by checking ix86_hardreg_mov_ok.

   /* Make operand1 a register if it isn't already.  */
   if (can_create_pseudo_p ()
-      && !register_operand (op0, mode)
-      && !register_operand (op1, mode))
+      && (!ix86_hardreg_mov_ok (op0, op1)
+  || (!register_operand (op0, mode)
+      && !register_operand (op1, mode))))
     {
       rtx tmp = ix86_gen_scratch_sse_rtx (GET_MODE (op0));

ix86_gen_scratch_sse_rtx probably returns a hard register, but here
you want a pseudo register.


>         (ix86_expand_vector_extract): Store zero-extended SImode
>         intermediate in a pseudo, then set target using a SUBREG_PROMOTED
>         annotated subreg.
>         * config/i386/sse.md (mov<VMOVE>_internal): Prevent CSE creating
>         complex (SUBREG) sets of (vector) hard registers before reload, by
>         checking ix86_hardreg_mov_ok.
>
>
> Thanks in advance,
> Roger
> --
>


--
BR,
Hongtao

Reply via email to