Hi! The r222470 commit changed =x into =v constraint in *truncdfsf_fast_mixed. The problem is that for some tunings we have a splitter /* For converting DF(xmm2) to SF(xmm1), use the following code instead of cvtsd2ss: unpcklpd xmm2,xmm2 ; packed conversion might crash on signaling NaNs cvtpd2ps xmm2,xmm1 If the input operand is memory, it attempts to emit sse2_loadlpd instruction. But, that define_insn doesn't have any v constraints and so we fail to recognize it. For the vmovsd 2 operand m -> v instruction *vec_concatv2df implements that too. So I see 3 options for this: 1) as the patch does, emit *vec_concatv2df manually 2) rename *vec_concatv2df to vec_concatv2df and use gen_vec_concatv2df in the splitter; possibly use it instead of sse2_loadlpd there, because that insn has uglier/more complex pattern 3) tweak sse2_loadlpd - add various v alternatives to it, guard them with avx512vl isa, etc.
I bet the 3) treatment is desirable and likely many other instructions need it, but that doesn't sound like stage4 material to me, I find it quite risky, do you agree? If yes, the following patch can work temporarily (bootstrapped/regtested on x86_64-linux and i686-linux), or I can do 2), but in that case I'd like to know your preferences about the suboption (whether to replace gen_sse2_loadlpd with gen_vec_concatv2df or whether to use it only for the EXT_REX_SSE_REG_P regs). 2016-03-04 Jakub Jelinek <ja...@redhat.com> PR target/70086 * config/i386/i386.md (truncdfsf2 splitter): Handle EXT_REX_SSE_REG_P destination with memory input. * gcc.target/i386/pr70086-1.c: New test. * gcc.target/i386/pr70086-2.c: New test. --- gcc/config/i386/i386.md.jj 2016-03-02 14:09:50.000000000 +0100 +++ gcc/config/i386/i386.md 2016-03-04 22:56:32.206840674 +0100 @@ -4392,6 +4392,11 @@ (define_split operands[4] = simplify_gen_subreg (V2DFmode, operands[1], DFmode, 0); emit_insn (gen_vec_dupv2df (operands[4], operands[1])); } + else if (EXT_REX_SSE_REG_P (operands[4])) + /* Emit *vec_concatv2df. */ + emit_insn (gen_rtx_SET (operands[4], + gen_rtx_VEC_CONCAT (V2DFmode, operands[1], + CONST0_RTX (DFmode)))); else emit_insn (gen_sse2_loadlpd (operands[4], CONST0_RTX (V2DFmode), operands[1])); --- gcc/testsuite/gcc.target/i386/pr70086-1.c.jj 2016-03-04 23:01:07.447081169 +0100 +++ gcc/testsuite/gcc.target/i386/pr70086-1.c 2016-03-04 23:00:27.000000000 +0100 @@ -0,0 +1,11 @@ +/* PR target/70086 */ +/* { dg-do compile } */ +/* { dg-options "-mtune=barcelona -mavx512vl -ffloat-store" } */ + +float +foo (float a, float b, double c, float d, double e, float f) +{ + e -= d; + d *= e; + return e + d; +} --- gcc/testsuite/gcc.target/i386/pr70086-2.c.jj 2016-03-04 23:01:07.447081169 +0100 +++ gcc/testsuite/gcc.target/i386/pr70086-2.c 2016-03-04 23:00:27.000000000 +0100 @@ -0,0 +1,12 @@ +/* PR target/70086 */ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mtune=barcelona -mavx512vl" } */ + +float +foo (double *p) +{ + register float xmm16 __asm ("xmm16"); + xmm16 = *p; + asm volatile ("" : "+v" (xmm16)); + return xmm16; +} Jakub