On Sat, Mar 5, 2016 at 7:39 AM, Jakub Jelinek <ja...@redhat.com> wrote: > Hi! > > The r222470 commit changed =x into =v constraint in *truncdfsf_fast_mixed. > The problem is that for some tunings we have a splitter > /* For converting DF(xmm2) to SF(xmm1), use the following code instead of > cvtsd2ss: > unpcklpd xmm2,xmm2 ; packed conversion might crash on signaling NaNs > cvtpd2ps xmm2,xmm1 > If the input operand is memory, it attempts to emit sse2_loadlpd > instruction. But, that define_insn doesn't have any v constraints and so we > fail to recognize it. For the vmovsd 2 operand m -> v instruction > *vec_concatv2df implements that too. > So I see 3 options for this: > 1) as the patch does, emit *vec_concatv2df manually > 2) rename *vec_concatv2df to vec_concatv2df and use gen_vec_concatv2df > in the splitter; possibly use it instead of sse2_loadlpd there, because > that insn has uglier/more complex pattern > 3) tweak sse2_loadlpd - add various v alternatives to it, guard them with > avx512vl isa, etc. > > I bet the 3) treatment is desirable and likely many other instructions need > it, but that doesn't sound like stage4 material to me, I find it quite > risky, do you agree? If yes, the following patch can work temporarily > (bootstrapped/regtested on x86_64-linux and i686-linux), or I can do 2), > but in that case I'd like to know your preferences about the suboption > (whether to replace gen_sse2_loadlpd with gen_vec_concatv2df or whether > to use it only for the EXT_REX_SSE_REG_P regs).
Let's go with the option 2) and always generate vec_concatv2df, as we only need it for [v,m,C] alternative. In the long term, we should enhance all patterns with new alternatives, but not in stage-4. Attached (lightly tested) patch that implements option 2) also allows us to simplify splitter enable condition a bit. Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index cb8bcec..ef80d6a 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -4362,9 +4362,8 @@ (match_operand:DF 1 "nonimmediate_operand")))] "TARGET_USE_VECTOR_FP_CONVERTS && optimize_insn_for_speed_p () - && reload_completed && SSE_REG_P (operands[0]) - && (!EXT_REX_SSE_REG_P (operands[0]) - || TARGET_AVX512VL)" + && reload_completed + && SSE_REG_P (operands[0])" [(set (match_dup 2) (vec_concat:V4SF (float_truncate:V2SF @@ -4393,8 +4392,10 @@ emit_insn (gen_vec_dupv2df (operands[4], operands[1])); } else - emit_insn (gen_sse2_loadlpd (operands[4], - CONST0_RTX (V2DFmode), operands[1])); + /* Emit *vec_concatv2df. */ + emit_insn (gen_rtx_SET (operands[4], + gen_rtx_VEC_CONCAT (V2DFmode, operands[1], + CONST0_RTX (DFmode)))); }) ;; It's more profitable to split and then extend in the same register.