On Sat, Mar 5, 2016 at 7:39 AM, Jakub Jelinek <ja...@redhat.com> wrote:
> Hi!
>
> The r222470 commit changed =x into =v constraint in *truncdfsf_fast_mixed.
> The problem is that for some tunings we have a splitter
> /* For converting DF(xmm2) to SF(xmm1), use the following code instead of
>    cvtsd2ss:
>       unpcklpd xmm2,xmm2   ; packed conversion might crash on signaling NaNs
>       cvtpd2ps xmm2,xmm1
> If the input operand is memory, it attempts to emit sse2_loadlpd
> instruction.  But, that define_insn doesn't have any v constraints and so we
> fail to recognize it.  For the vmovsd 2 operand m -> v instruction
> *vec_concatv2df implements that too.
> So I see 3 options for this:
> 1) as the patch does, emit *vec_concatv2df manually
> 2) rename *vec_concatv2df to vec_concatv2df and use gen_vec_concatv2df
>    in the splitter; possibly use it instead of sse2_loadlpd there, because
>    that insn has uglier/more complex pattern
> 3) tweak sse2_loadlpd - add various v alternatives to it, guard them with
>    avx512vl isa, etc.
>
> I bet the 3) treatment is desirable and likely many other instructions need
> it, but that doesn't sound like stage4 material to me, I find it quite
> risky, do you agree?  If yes, the following patch can work temporarily
> (bootstrapped/regtested on x86_64-linux and i686-linux), or I can do 2),
> but in that case I'd like to know your preferences about the suboption
> (whether to replace gen_sse2_loadlpd with gen_vec_concatv2df or whether
> to use it only for the EXT_REX_SSE_REG_P regs).

Let's go with the option 2) and always generate vec_concatv2df, as we
only need it for [v,m,C] alternative. In the long term, we should
enhance all patterns with new alternatives, but not in stage-4.

Attached (lightly tested) patch that implements option 2) also allows
us to simplify splitter enable condition a bit.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index cb8bcec..ef80d6a 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -4362,9 +4362,8 @@
          (match_operand:DF 1 "nonimmediate_operand")))]
   "TARGET_USE_VECTOR_FP_CONVERTS
    && optimize_insn_for_speed_p ()
-   && reload_completed && SSE_REG_P (operands[0])
-   && (!EXT_REX_SSE_REG_P (operands[0])
-       || TARGET_AVX512VL)"
+   && reload_completed
+   && SSE_REG_P (operands[0])"
    [(set (match_dup 2)
         (vec_concat:V4SF
           (float_truncate:V2SF
@@ -4393,8 +4392,10 @@
       emit_insn (gen_vec_dupv2df (operands[4], operands[1]));
     }
   else
-    emit_insn (gen_sse2_loadlpd (operands[4],
-                                CONST0_RTX (V2DFmode), operands[1]));
+    /* Emit *vec_concatv2df.  */
+    emit_insn (gen_rtx_SET (operands[4],
+                           gen_rtx_VEC_CONCAT (V2DFmode, operands[1],
+                                               CONST0_RTX (DFmode))));
 })
 
 ;; It's more profitable to split and then extend in the same register.

Reply via email to