The following splits out the change that makes the live-range split / mode change copy pseudos vector mode pseudos. I've tried reproducing the issue I faced with SImode chains but failed likely because we're never using simple moves in the end while for SImode we can end up with
(set (subreg:V4SI (reg:SI 42) 0) (subreg:V4SI (reg:SI 41) 0)) which we happily propagate out. So pursuing this patch independently doesn't make much sense. Still the main change is in dimode_scalar_chain::make_vector_copies where 'vreg' is now V2DImode instead of DImode and the change to emit_move_insn (vreg, tmp) hints at the change done to the above problematic insn. The rest of the changes deal with some regs already being in the appropriate vector mode. I realize I might have avoided my original issue by emitting not the above reg-reg move but (set (subreg:V4SI (reg:SI 42) 0) (vec_merge:V4SI (vec_duplicate:V4SI (reg:SI 41)) (const_vec [0 0 0 0]) (const_1))) but I didn't try (the full patch contained a testcase for the issue). Still it seems odd to use DImode pseudos when all uses are wrapped in (subreg:V2DI ...). As (unrelated) change I propose to error with fatal_insn_not_found in case recog doesn't recognize the altered insns which make it easier to spot errors early. Bootstrap / regtest running on x86_64-unknown-linux-gnu ontop of [1/2], OK for trunk if that succeeds? Thanks, Richard. 2019-08-13 Richard Biener <rguent...@suse.de> * config/i386/i386-features.c (dimode_scalar_chain::replace_with_subreg): Elide the subreg if the reg is already vector. (dimode_scalar_chain::convert_op): Likewise. (dimode_scalar_chain::make_vector_copies): Use a vector-mode pseudo as destination to avoid later passes copy-propagating it out. (dimode_chain::convert_insn): Convert the source of a memory op in case it is not of the appropriate scalar mode. Raise fatal_insn_not_found if the converted insn is not recognized. Index: gcc/config/i386/i386-features.c =================================================================== --- gcc/config/i386/i386-features.c (revision 274328) +++ gcc/config/i386/i386-features.c (working copy) @@ -573,7 +573,8 @@ rtx dimode_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) { if (x == reg) - return gen_rtx_SUBREG (V2DImode, new_reg, 0); + return (GET_MODE (new_reg) == V2DImode + ? new_reg : gen_rtx_SUBREG (V2DImode, new_reg, 0)); const char *fmt = GET_RTX_FORMAT (GET_CODE (x)); int i, j; @@ -627,7 +628,7 @@ void dimode_scalar_chain::make_vector_copies (unsigned regno) { rtx reg = regno_reg_rtx[regno]; - rtx vreg = gen_reg_rtx (DImode); + rtx vreg = gen_reg_rtx (V2DImode); df_ref ref; for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref)) @@ -641,7 +642,12 @@ dimode_scalar_chain::make_vector_copies gen_rtx_SUBREG (SImode, reg, 0)); emit_move_insn (adjust_address (tmp, SImode, 4), gen_rtx_SUBREG (SImode, reg, 4)); - emit_move_insn (vreg, tmp); + emit_move_insn (vreg, + gen_rtx_VEC_MERGE (V2DImode, + gen_rtx_VEC_DUPLICATE (V2DImode, + tmp), + CONST0_RTX (V2DImode), + GEN_INT (HOST_WIDE_INT_1U))); } else if (TARGET_SSE4_1) { @@ -841,7 +847,8 @@ dimode_scalar_chain::convert_op (rtx *op gcc_assert (!DF_REF_CHAIN (ref)); break; } - *op = gen_rtx_SUBREG (V2DImode, *op, 0); + if (GET_MODE (*op) != V2DImode) + *op = gen_rtx_SUBREG (V2DImode, *op, 0); } else if (CONST_INT_P (*op)) { @@ -931,6 +938,8 @@ dimode_scalar_chain::convert_insn (rtx_i case MEM: if (!REG_P (dst)) convert_op (&src, insn); + else if (GET_MODE (src) != DImode) + src = gen_rtx_SUBREG (DImode, src, 0); break; case REG: @@ -977,7 +986,9 @@ dimode_scalar_chain::convert_insn (rtx_i PATTERN (insn) = def_set; INSN_CODE (insn) = -1; - recog_memoized (insn); + int patt = recog_memoized (insn); + if (patt == -1) + fatal_insn_not_found (insn); df_insn_rescan (insn); }