https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93039

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2020-01-08
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |uros at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
STV doesn't recognize

(insn 7 6 11 2 (parallel [
            (set (subreg:SI (reg:SF 84 [ <retval> ]) 0)
                (and:SI (subreg:SI (reg:SF 88) 0)
                    (const_int 2147483647 [0x7fffffff])))
            (clobber (reg:CC 17 flags))
        ]) "t.c":5:13 444 {*andsi_1}
     (expr_list:REG_UNUSED (reg:CC 17 flags)
        (expr_list:REG_DEAD (reg:SF 88)
            (nil))))

it has

  if (!REG_P (XEXP (src, 0))
      && !MEM_P (XEXP (src, 0))
      && !CONST_INT_P (XEXP (src, 0))
      /* Check for andnot case.  */
      && (GET_CODE (src) != AND
          || GET_CODE (XEXP (src, 0)) != NOT
          || !REG_P (XEXP (XEXP (src, 0), 0))))
      return false;

and thus doesn't allow punning subregs.  OTOH I wonder why the above
isn't matched by a SImode SSE op ... (yeah, well, we don't have that).

If I "fix" STV with

Index: gcc/config/i386/i386-features.c
===================================================================
--- gcc/config/i386/i386-features.c     (revision 280006)
+++ gcc/config/i386/i386-features.c     (working copy)
@@ -1365,7 +1365,7 @@ general_scalar_to_vector_candidate_p (rt
       || GET_MODE (dst) != mode)
     return false;

-  if (!REG_P (dst) && !MEM_P (dst))
+  if (!REG_P (dst) && !SUBREG_P (dst) && !MEM_P (dst))
     return false;

   switch (GET_CODE (src))
@@ -1422,6 +1422,7 @@ general_scalar_to_vector_candidate_p (rt
     }

   if (!REG_P (XEXP (src, 0))
+      && !SUBREG_P (XEXP (src, 0))
       && !MEM_P (XEXP (src, 0))
       && !CONST_INT_P (XEXP (src, 0))
       /* Check for andnot case.  */

I see

Building chain #1...
  Adding insn 7 to chain #1
  r84 use in insn 11 isn't convertible
  Mark r84 def in insn 7 as requiring both modes in chain #1
  r88 def in insn 14 isn't convertible
  Mark r88 def in insn 14 as requiring both modes in chain #1
Collected chain #1...
  insns: 7
  defs to convert: r84, r88
Computing gain for chain #1...
  Instruction gain -6 for     7: {r84:SF#0=r88:SF#0&0x7fffffff;clobber
flags:CC;}
      REG_UNUSED flags:CC
      REG_DEAD r88:SF
  Instruction conversion gain: -6
  Registers conversion cost: 12
  Total gain: -18
Chain #1 conversion is not profitable

so besides it not handling the subregs correctly for costing the
costing for the actual instruction is negative as well (likely
because of the cost of loading the constant).  STV doesn't compute
"gain" when an existing conversion becomes unnecessary either.

The question is for which CPUs is it actually faster to use SSE?

Reply via email to