Hi!
The r222470 commit changed =x into =v constraint in *truncdfsf_fast_mixed.
The problem is that for some tunings we have a splitter
/* For converting DF(xmm2) to SF(xmm1), use the following code instead of
cvtsd2ss:
unpcklpd xmm2,xmm2 ; packed conversion might crash on signaling NaNs
cvtpd2ps xmm2,xmm1
If the input operand is memory, it attempts to emit sse2_loadlpd
instruction. But, that define_insn doesn't have any v constraints and so we
fail to recognize it. For the vmovsd 2 operand m -> v instruction
*vec_concatv2df implements that too.
So I see 3 options for this:
1) as the patch does, emit *vec_concatv2df manually
2) rename *vec_concatv2df to vec_concatv2df and use gen_vec_concatv2df
in the splitter; possibly use it instead of sse2_loadlpd there, because
that insn has uglier/more complex pattern
3) tweak sse2_loadlpd - add various v alternatives to it, guard them with
avx512vl isa, etc.
I bet the 3) treatment is desirable and likely many other instructions need
it, but that doesn't sound like stage4 material to me, I find it quite
risky, do you agree? If yes, the following patch can work temporarily
(bootstrapped/regtested on x86_64-linux and i686-linux), or I can do 2),
but in that case I'd like to know your preferences about the suboption
(whether to replace gen_sse2_loadlpd with gen_vec_concatv2df or whether
to use it only for the EXT_REX_SSE_REG_P regs).
2016-03-04 Jakub Jelinek <[email protected]>
PR target/70086
* config/i386/i386.md (truncdfsf2 splitter): Handle
EXT_REX_SSE_REG_P destination with memory input.
* gcc.target/i386/pr70086-1.c: New test.
* gcc.target/i386/pr70086-2.c: New test.
--- gcc/config/i386/i386.md.jj 2016-03-02 14:09:50.000000000 +0100
+++ gcc/config/i386/i386.md 2016-03-04 22:56:32.206840674 +0100
@@ -4392,6 +4392,11 @@ (define_split
operands[4] = simplify_gen_subreg (V2DFmode, operands[1], DFmode, 0);
emit_insn (gen_vec_dupv2df (operands[4], operands[1]));
}
+ else if (EXT_REX_SSE_REG_P (operands[4]))
+ /* Emit *vec_concatv2df. */
+ emit_insn (gen_rtx_SET (operands[4],
+ gen_rtx_VEC_CONCAT (V2DFmode, operands[1],
+ CONST0_RTX (DFmode))));
else
emit_insn (gen_sse2_loadlpd (operands[4],
CONST0_RTX (V2DFmode), operands[1]));
--- gcc/testsuite/gcc.target/i386/pr70086-1.c.jj 2016-03-04
23:01:07.447081169 +0100
+++ gcc/testsuite/gcc.target/i386/pr70086-1.c 2016-03-04 23:00:27.000000000
+0100
@@ -0,0 +1,11 @@
+/* PR target/70086 */
+/* { dg-do compile } */
+/* { dg-options "-mtune=barcelona -mavx512vl -ffloat-store" } */
+
+float
+foo (float a, float b, double c, float d, double e, float f)
+{
+ e -= d;
+ d *= e;
+ return e + d;
+}
--- gcc/testsuite/gcc.target/i386/pr70086-2.c.jj 2016-03-04
23:01:07.447081169 +0100
+++ gcc/testsuite/gcc.target/i386/pr70086-2.c 2016-03-04 23:00:27.000000000
+0100
@@ -0,0 +1,12 @@
+/* PR target/70086 */
+/* { dg-do compile { target { ! ia32 } } } */
+/* { dg-options "-O2 -mtune=barcelona -mavx512vl" } */
+
+float
+foo (double *p)
+{
+ register float xmm16 __asm ("xmm16");
+ xmm16 = *p;
+ asm volatile ("" : "+v" (xmm16));
+ return xmm16;
+}
Jakub