On Mon, Jul 4, 2011 at 7:13 AM, H.J. Lu <hjl.to...@gmail.com> wrote: >>>> In one SSE2 int->float split, when TARGET_USE_VECTOR_CONVERTS is true, >>>> TARGET_INTER_UNIT_MOVES is false and GENERAL_REG_P (op1) is true. we >>>> will get gcc_unreachable. This patch removes TARGET_INTER_UNIT_MOVES >>>> check. OK for trunk? >>> >>> This will result in register allocation failure. Operand 0 of > > That particular sse2_loadld insn matches: > > (insn 49 22 50 5 (set (reg:V4SI 21 xmm0 [83]) > (vec_merge:V4SI (vec_duplicate:V4SI (reg/v:SI 1 dx [orig:64 > test ] [64])) > (const_vector:V4SI [ > (const_int 0 [0]) > (const_int 0 [0]) > (const_int 0 [0]) > (const_int 0 [0]) > ]) > (const_int 1 [0x1]))) x.i:11 1365 {vec_setv4si_0} > (nil)) >
Yes, but it should not be generated for !TARGET_INTER_UNIT_MOVES. The constraint should be Yi, but then we don't shadow other alternatives correctly. >>> sse2_loadld pattern has conditional constraint Yi that depends on >>> TARGET_INTER_UNIT_MOVES, so we can't blindly generate sse2_loadld >>> after reload. I'm testing attached patch. >>> >>> BTW: Do you perhaps have a testcase for this problem? >> >> I have a testcase. But it needs a new x86 optimization we are working on it. >> >>> 2011-07-03 Uros Bizjak <ubiz...@gmail.com> >>> >>> PR target/49600 >>> * config/i386/i386.md (SSE2 int->float split): Push operand 1 in >>> general register to memory for !TARGET_INTER_UNIT_MOVES. >>> >> >> I will give it a try. >> > > It doesn't work: I still got Yes, I later noticed that I have changed the wrong pattern (the one with memory clobber) ;( . Attached is the correct patch. Uros.
Index: config/i386/i386.md =================================================================== --- config/i386/i386.md (revision 175786) +++ config/i386/i386.md (working copy) @@ -5022,11 +5022,20 @@ if (GET_CODE (op1) == SUBREG) op1 = SUBREG_REG (op1); - if (GENERAL_REG_P (op1) && TARGET_INTER_UNIT_MOVES) + if (GENERAL_REG_P (op1)) { operands[4] = simplify_gen_subreg (V4SImode, operands[0], <MODE>mode, 0); - emit_insn (gen_sse2_loadld (operands[4], - CONST0_RTX (V4SImode), operands[1])); + if (TARGET_INTER_UNIT_MOVES) + emit_insn (gen_sse2_loadld (operands[4], + CONST0_RTX (V4SImode), operands[1])); + else + { + operands[5] = ix86_force_to_memory (GET_MODE (operands[1]), + operands[1]); + emit_insn (gen_sse2_loadld (operands[4], + CONST0_RTX (V4SImode), operands[5])); + ix86_free_from_memory (GET_MODE (operands[1])); + } } /* We can ignore possible trapping value in the high part of SSE register for non-trapping math. */