On Wed, Aug 14, 2019 at 11:08 AM Richard Biener <rguent...@suse.de> wrote: > > On Tue, 13 Aug 2019, Jeff Law wrote: > > > On 8/9/19 7:00 AM, Richard Biener wrote: > > > > > > It fixes the slowdown observed in 416.gamess and 464.h264ref. > > > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing still in progress. > > > > > > CCing Jeff who "knows RTL". > > What specifically do you want me to look at? I'm not really familiar > > with the STV stuff, but can certainly take a peek. > > Below is the updated patch with the already approved and committed > parts taken out. It is not mostly mechanical apart from the > make_vector_copies and convert_reg changes which move existing > "patterns" under appropriate conditionals and adds handling of the > case where the scalar mode fits in a single GPR (previously it > was -m32 DImode only, now it handles -m32/-m64 SImode and DImode). > > I'm redoing bootstrap / regtest on x86_64-unknown-linux-gnu now just > to be safe. > > OK? > > I do expect we need to work on the compile-time issue I placed ??? > comments on and more generally try to avoid using DF so much. > > Thanks, > Richard. > > 2019-08-13 Richard Biener <rguent...@suse.de> > > PR target/91154 > * config/i386/i386-features.h (scalar_chain::scalar_chain): Add > mode arguments. > (scalar_chain::smode): New member. > (scalar_chain::vmode): Likewise. > (dimode_scalar_chain): Rename to... > (general_scalar_chain): ... this. > (general_scalar_chain::general_scalar_chain): Take mode arguments. > (timode_scalar_chain::timode_scalar_chain): Initialize scalar_chain > base with TImode and V1TImode. > * config/i386/i386-features.c (scalar_chain::scalar_chain): Adjust. > (general_scalar_chain::vector_const_cost): Adjust for SImode > chains. > (general_scalar_chain::compute_convert_gain): Likewise. Add > {S,U}{MIN,MAX} support. > (general_scalar_chain::replace_with_subreg): Use vmode/smode. > (general_scalar_chain::make_vector_copies): Likewise. Handle > non-DImode chains appropriately. > (general_scalar_chain::convert_reg): Likewise. > (general_scalar_chain::convert_op): Likewise. > (general_scalar_chain::convert_insn): Likewise. Add > fatal_insn_not_found if the result is not recognized. > (convertible_comparison_p): Pass in the scalar mode and use that. > (general_scalar_to_vector_candidate_p): Likewise. Rename from > dimode_scalar_to_vector_candidate_p. Add {S,U}{MIN,MAX} support. > (scalar_to_vector_candidate_p): Remove by inlining into single > caller. > (general_remove_non_convertible_regs): Rename from > dimode_remove_non_convertible_regs. > (remove_non_convertible_regs): Remove by inlining into single caller. > (convert_scalars_to_vector): Handle SImode and DImode chains > in addition to TImode chains. > * config/i386/i386.md (<maxmin><MAXMIN_IMODE>3): New expander. > (*<maxmin><MAXMIN_IMODE>3_1): New insn-and-split. > (*<maxmin>di3_doubleword): Likewise. > > * gcc.target/i386/pr91154.c: New testcase. > * gcc.target/i386/minmax-3.c: Likewise. > * gcc.target/i386/minmax-4.c: Likewise. > * gcc.target/i386/minmax-5.c: Likewise. > * gcc.target/i386/minmax-6.c: Likewise. > * gcc.target/i386/minmax-1.c: Add -mno-stv. > * gcc.target/i386/minmax-2.c: Likewise.
OK. Thanks, Uros. > Index: gcc/config/i386/i386-features.c > =================================================================== > --- gcc/config/i386/i386-features.c (revision 274422) > +++ gcc/config/i386/i386-features.c (working copy) > @@ -276,8 +276,11 @@ unsigned scalar_chain::max_id = 0; > > /* Initialize new chain. */ > > -scalar_chain::scalar_chain () > +scalar_chain::scalar_chain (enum machine_mode smode_, enum machine_mode > vmode_) > { > + smode = smode_; > + vmode = vmode_; > + > chain_id = ++max_id; > > if (dump_file) > @@ -319,7 +322,7 @@ scalar_chain::add_to_queue (unsigned ins > conversion. */ > > void > -dimode_scalar_chain::mark_dual_mode_def (df_ref def) > +general_scalar_chain::mark_dual_mode_def (df_ref def) > { > gcc_assert (DF_REF_REG_DEF_P (def)); > > @@ -409,6 +412,9 @@ scalar_chain::add_insn (bitmap candidate > && !HARD_REGISTER_P (SET_DEST (def_set))) > bitmap_set_bit (defs, REGNO (SET_DEST (def_set))); > > + /* ??? The following is quadratic since analyze_register_chain > + iterates over all refs to look for dual-mode regs. Instead this > + should be done separately for all regs mentioned in the chain once. */ > df_ref ref; > df_ref def; > for (ref = DF_INSN_UID_DEFS (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) > @@ -469,19 +475,21 @@ scalar_chain::build (bitmap candidates, > instead of using a scalar one. */ > > int > -dimode_scalar_chain::vector_const_cost (rtx exp) > +general_scalar_chain::vector_const_cost (rtx exp) > { > gcc_assert (CONST_INT_P (exp)); > > - if (standard_sse_constant_p (exp, V2DImode)) > - return COSTS_N_INSNS (1); > - return ix86_cost->sse_load[1]; > + if (standard_sse_constant_p (exp, vmode)) > + return ix86_cost->sse_op; > + /* We have separate costs for SImode and DImode, use SImode costs > + for smaller modes. */ > + return ix86_cost->sse_load[smode == DImode ? 1 : 0]; > } > > /* Compute a gain for chain conversion. */ > > int > -dimode_scalar_chain::compute_convert_gain () > +general_scalar_chain::compute_convert_gain () > { > bitmap_iterator bi; > unsigned insn_uid; > @@ -491,6 +499,13 @@ dimode_scalar_chain::compute_convert_gai > if (dump_file) > fprintf (dump_file, "Computing gain for chain #%d...\n", chain_id); > > + /* SSE costs distinguish between SImode and DImode loads/stores, for > + int costs factor in the number of GPRs involved. When supporting > + smaller modes than SImode the int load/store costs need to be > + adjusted as well. */ > + unsigned sse_cost_idx = smode == DImode ? 1 : 0; > + unsigned m = smode == DImode ? (TARGET_64BIT ? 1 : 2) : 1; > + > EXECUTE_IF_SET_IN_BITMAP (insns, 0, insn_uid, bi) > { > rtx_insn *insn = DF_INSN_UID_GET (insn_uid)->insn; > @@ -500,18 +515,19 @@ dimode_scalar_chain::compute_convert_gai > int igain = 0; > > if (REG_P (src) && REG_P (dst)) > - igain += 2 - ix86_cost->xmm_move; > + igain += 2 * m - ix86_cost->xmm_move; > else if (REG_P (src) && MEM_P (dst)) > - igain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; > + igain > + += m * ix86_cost->int_store[2] - ix86_cost->sse_store[sse_cost_idx]; > else if (MEM_P (src) && REG_P (dst)) > - igain += 2 * ix86_cost->int_load[2] - ix86_cost->sse_load[1]; > + igain += m * ix86_cost->int_load[2] - > ix86_cost->sse_load[sse_cost_idx]; > else if (GET_CODE (src) == ASHIFT > || GET_CODE (src) == ASHIFTRT > || GET_CODE (src) == LSHIFTRT) > { > if (CONST_INT_P (XEXP (src, 0))) > igain -= vector_const_cost (XEXP (src, 0)); > - igain += 2 * ix86_cost->shift_const - ix86_cost->sse_op; > + igain += m * ix86_cost->shift_const - ix86_cost->sse_op; > if (INTVAL (XEXP (src, 1)) >= 32) > igain -= COSTS_N_INSNS (1); > } > @@ -521,11 +537,11 @@ dimode_scalar_chain::compute_convert_gai > || GET_CODE (src) == XOR > || GET_CODE (src) == AND) > { > - igain += 2 * ix86_cost->add - ix86_cost->sse_op; > + igain += m * ix86_cost->add - ix86_cost->sse_op; > /* Additional gain for andnot for targets without BMI. */ > if (GET_CODE (XEXP (src, 0)) == NOT > && !TARGET_BMI) > - igain += 2 * ix86_cost->add; > + igain += m * ix86_cost->add; > > if (CONST_INT_P (XEXP (src, 0))) > igain -= vector_const_cost (XEXP (src, 0)); > @@ -534,7 +550,18 @@ dimode_scalar_chain::compute_convert_gai > } > else if (GET_CODE (src) == NEG > || GET_CODE (src) == NOT) > - igain += 2 * ix86_cost->add - ix86_cost->sse_op - COSTS_N_INSNS (1); > + igain += m * ix86_cost->add - ix86_cost->sse_op - COSTS_N_INSNS (1); > + else if (GET_CODE (src) == SMAX > + || GET_CODE (src) == SMIN > + || GET_CODE (src) == UMAX > + || GET_CODE (src) == UMIN) > + { > + /* We do not have any conditional move cost, estimate it as a > + reg-reg move. Comparisons are costed as adds. */ > + igain += m * (COSTS_N_INSNS (2) + ix86_cost->add); > + /* Integer SSE ops are all costed the same. */ > + igain -= ix86_cost->sse_op; > + } > else if (GET_CODE (src) == COMPARE) > { > /* Assume comparison cost is the same. */ > @@ -542,9 +569,11 @@ dimode_scalar_chain::compute_convert_gai > else if (CONST_INT_P (src)) > { > if (REG_P (dst)) > - igain += 2 * COSTS_N_INSNS (1); > + /* DImode can be immediate for TARGET_64BIT and SImode always. */ > + igain += m * COSTS_N_INSNS (1); > else if (MEM_P (dst)) > - igain += 2 * ix86_cost->int_store[2] - ix86_cost->sse_store[1]; > + igain += (m * ix86_cost->int_store[2] > + - ix86_cost->sse_store[sse_cost_idx]); > igain -= vector_const_cost (src); > } > else > @@ -561,6 +590,7 @@ dimode_scalar_chain::compute_convert_gai > if (dump_file) > fprintf (dump_file, " Instruction conversion gain: %d\n", gain); > > + /* ??? What about integer to SSE? */ > EXECUTE_IF_SET_IN_BITMAP (defs_conv, 0, insn_uid, bi) > cost += DF_REG_DEF_COUNT (insn_uid) * ix86_cost->sse_to_integer; > > @@ -578,10 +608,10 @@ dimode_scalar_chain::compute_convert_gai > /* Replace REG in X with a V2DI subreg of NEW_REG. */ > > rtx > -dimode_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) > +general_scalar_chain::replace_with_subreg (rtx x, rtx reg, rtx new_reg) > { > if (x == reg) > - return gen_rtx_SUBREG (V2DImode, new_reg, 0); > + return gen_rtx_SUBREG (vmode, new_reg, 0); > > const char *fmt = GET_RTX_FORMAT (GET_CODE (x)); > int i, j; > @@ -601,7 +631,7 @@ dimode_scalar_chain::replace_with_subreg > /* Replace REG in INSN with a V2DI subreg of NEW_REG. */ > > void > -dimode_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn, > +general_scalar_chain::replace_with_subreg_in_insn (rtx_insn *insn, > rtx reg, rtx new_reg) > { > replace_with_subreg (single_set (insn), reg, new_reg); > @@ -632,10 +662,10 @@ scalar_chain::emit_conversion_insns (rtx > and replace its uses in a chain. */ > > void > -dimode_scalar_chain::make_vector_copies (unsigned regno) > +general_scalar_chain::make_vector_copies (unsigned regno) > { > rtx reg = regno_reg_rtx[regno]; > - rtx vreg = gen_reg_rtx (DImode); > + rtx vreg = gen_reg_rtx (smode); > df_ref ref; > > for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref)) > @@ -644,37 +674,59 @@ dimode_scalar_chain::make_vector_copies > start_sequence (); > if (!TARGET_INTER_UNIT_MOVES_TO_VEC) > { > - rtx tmp = assign_386_stack_local (DImode, SLOT_STV_TEMP); > - emit_move_insn (adjust_address (tmp, SImode, 0), > - gen_rtx_SUBREG (SImode, reg, 0)); > - emit_move_insn (adjust_address (tmp, SImode, 4), > - gen_rtx_SUBREG (SImode, reg, 4)); > - emit_move_insn (vreg, tmp); > + rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP); > + if (smode == DImode && !TARGET_64BIT) > + { > + emit_move_insn (adjust_address (tmp, SImode, 0), > + gen_rtx_SUBREG (SImode, reg, 0)); > + emit_move_insn (adjust_address (tmp, SImode, 4), > + gen_rtx_SUBREG (SImode, reg, 4)); > + } > + else > + emit_move_insn (tmp, reg); > + emit_insn (gen_rtx_SET > + (gen_rtx_SUBREG (vmode, vreg, 0), > + gen_rtx_VEC_MERGE (vmode, > + gen_rtx_VEC_DUPLICATE (vmode, > + tmp), > + CONST0_RTX (vmode), > + GEN_INT (HOST_WIDE_INT_1U)))); > } > - else if (TARGET_SSE4_1) > + else if (!TARGET_64BIT && smode == DImode) > { > - emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0), > - CONST0_RTX (V4SImode), > - gen_rtx_SUBREG (SImode, reg, 0))); > - emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, 0), > - gen_rtx_SUBREG (V4SImode, vreg, 0), > - gen_rtx_SUBREG (SImode, reg, 4), > - GEN_INT (2))); > + if (TARGET_SSE4_1) > + { > + emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, > 0), > + CONST0_RTX (V4SImode), > + gen_rtx_SUBREG (SImode, reg, 0))); > + emit_insn (gen_sse4_1_pinsrd (gen_rtx_SUBREG (V4SImode, vreg, > 0), > + gen_rtx_SUBREG (V4SImode, vreg, > 0), > + gen_rtx_SUBREG (SImode, reg, 4), > + GEN_INT (2))); > + } > + else > + { > + rtx tmp = gen_reg_rtx (DImode); > + emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, > 0), > + CONST0_RTX (V4SImode), > + gen_rtx_SUBREG (SImode, reg, 0))); > + emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0), > + CONST0_RTX (V4SImode), > + gen_rtx_SUBREG (SImode, reg, 4))); > + emit_insn (gen_vec_interleave_lowv4si > + (gen_rtx_SUBREG (V4SImode, vreg, 0), > + gen_rtx_SUBREG (V4SImode, vreg, 0), > + gen_rtx_SUBREG (V4SImode, tmp, 0))); > + } > } > else > - { > - rtx tmp = gen_reg_rtx (DImode); > - emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, vreg, 0), > - CONST0_RTX (V4SImode), > - gen_rtx_SUBREG (SImode, reg, 0))); > - emit_insn (gen_sse2_loadld (gen_rtx_SUBREG (V4SImode, tmp, 0), > - CONST0_RTX (V4SImode), > - gen_rtx_SUBREG (SImode, reg, 4))); > - emit_insn (gen_vec_interleave_lowv4si > - (gen_rtx_SUBREG (V4SImode, vreg, 0), > - gen_rtx_SUBREG (V4SImode, vreg, 0), > - gen_rtx_SUBREG (V4SImode, tmp, 0))); > - } > + emit_insn (gen_rtx_SET > + (gen_rtx_SUBREG (vmode, vreg, 0), > + gen_rtx_VEC_MERGE (vmode, > + gen_rtx_VEC_DUPLICATE (vmode, > + reg), > + CONST0_RTX (vmode), > + GEN_INT (HOST_WIDE_INT_1U)))); > rtx_insn *seq = get_insns (); > end_sequence (); > rtx_insn *insn = DF_REF_INSN (ref); > @@ -703,7 +755,7 @@ dimode_scalar_chain::make_vector_copies > in case register is used in not convertible insn. */ > > void > -dimode_scalar_chain::convert_reg (unsigned regno) > +general_scalar_chain::convert_reg (unsigned regno) > { > bool scalar_copy = bitmap_bit_p (defs_conv, regno); > rtx reg = regno_reg_rtx[regno]; > @@ -715,7 +767,7 @@ dimode_scalar_chain::convert_reg (unsign > bitmap_copy (conv, insns); > > if (scalar_copy) > - scopy = gen_reg_rtx (DImode); > + scopy = gen_reg_rtx (smode); > > for (ref = DF_REG_DEF_CHAIN (regno); ref; ref = DF_REF_NEXT_REG (ref)) > { > @@ -735,40 +787,55 @@ dimode_scalar_chain::convert_reg (unsign > start_sequence (); > if (!TARGET_INTER_UNIT_MOVES_FROM_VEC) > { > - rtx tmp = assign_386_stack_local (DImode, SLOT_STV_TEMP); > + rtx tmp = assign_386_stack_local (smode, SLOT_STV_TEMP); > emit_move_insn (tmp, reg); > - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0), > - adjust_address (tmp, SImode, 0)); > - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4), > - adjust_address (tmp, SImode, 4)); > + if (!TARGET_64BIT && smode == DImode) > + { > + emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0), > + adjust_address (tmp, SImode, 0)); > + emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4), > + adjust_address (tmp, SImode, 4)); > + } > + else > + emit_move_insn (scopy, tmp); > } > - else if (TARGET_SSE4_1) > + else if (!TARGET_64BIT && smode == DImode) > { > - rtx tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, > const0_rtx)); > - emit_insn > - (gen_rtx_SET > - (gen_rtx_SUBREG (SImode, scopy, 0), > - gen_rtx_VEC_SELECT (SImode, > - gen_rtx_SUBREG (V4SImode, reg, 0), > tmp))); > - > - tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const1_rtx)); > - emit_insn > - (gen_rtx_SET > - (gen_rtx_SUBREG (SImode, scopy, 4), > - gen_rtx_VEC_SELECT (SImode, > - gen_rtx_SUBREG (V4SImode, reg, 0), > tmp))); > + if (TARGET_SSE4_1) > + { > + rtx tmp = gen_rtx_PARALLEL (VOIDmode, > + gen_rtvec (1, const0_rtx)); > + emit_insn > + (gen_rtx_SET > + (gen_rtx_SUBREG (SImode, scopy, 0), > + gen_rtx_VEC_SELECT (SImode, > + gen_rtx_SUBREG (V4SImode, reg, 0), > + tmp))); > + > + tmp = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, > const1_rtx)); > + emit_insn > + (gen_rtx_SET > + (gen_rtx_SUBREG (SImode, scopy, 4), > + gen_rtx_VEC_SELECT (SImode, > + gen_rtx_SUBREG (V4SImode, reg, 0), > + tmp))); > + } > + else > + { > + rtx vcopy = gen_reg_rtx (V2DImode); > + emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0)); > + emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0), > + gen_rtx_SUBREG (SImode, vcopy, 0)); > + emit_move_insn (vcopy, > + gen_rtx_LSHIFTRT (V2DImode, > + vcopy, GEN_INT (32))); > + emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4), > + gen_rtx_SUBREG (SImode, vcopy, 0)); > + } > } > else > - { > - rtx vcopy = gen_reg_rtx (V2DImode); > - emit_move_insn (vcopy, gen_rtx_SUBREG (V2DImode, reg, 0)); > - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 0), > - gen_rtx_SUBREG (SImode, vcopy, 0)); > - emit_move_insn (vcopy, > - gen_rtx_LSHIFTRT (V2DImode, vcopy, GEN_INT > (32))); > - emit_move_insn (gen_rtx_SUBREG (SImode, scopy, 4), > - gen_rtx_SUBREG (SImode, vcopy, 0)); > - } > + emit_move_insn (scopy, reg); > + > rtx_insn *seq = get_insns (); > end_sequence (); > emit_conversion_insns (seq, insn); > @@ -817,21 +884,21 @@ dimode_scalar_chain::convert_reg (unsign > registers conversion. */ > > void > -dimode_scalar_chain::convert_op (rtx *op, rtx_insn *insn) > +general_scalar_chain::convert_op (rtx *op, rtx_insn *insn) > { > *op = copy_rtx_if_shared (*op); > > if (GET_CODE (*op) == NOT) > { > convert_op (&XEXP (*op, 0), insn); > - PUT_MODE (*op, V2DImode); > + PUT_MODE (*op, vmode); > } > else if (MEM_P (*op)) > { > - rtx tmp = gen_reg_rtx (DImode); > + rtx tmp = gen_reg_rtx (GET_MODE (*op)); > > emit_insn_before (gen_move_insn (tmp, *op), insn); > - *op = gen_rtx_SUBREG (V2DImode, tmp, 0); > + *op = gen_rtx_SUBREG (vmode, tmp, 0); > > if (dump_file) > fprintf (dump_file, " Preloading operand for insn %d into r%d\n", > @@ -849,24 +916,30 @@ dimode_scalar_chain::convert_op (rtx *op > gcc_assert (!DF_REF_CHAIN (ref)); > break; > } > - *op = gen_rtx_SUBREG (V2DImode, *op, 0); > + *op = gen_rtx_SUBREG (vmode, *op, 0); > } > else if (CONST_INT_P (*op)) > { > rtx vec_cst; > - rtx tmp = gen_rtx_SUBREG (V2DImode, gen_reg_rtx (DImode), 0); > + rtx tmp = gen_rtx_SUBREG (vmode, gen_reg_rtx (smode), 0); > > /* Prefer all ones vector in case of -1. */ > if (constm1_operand (*op, GET_MODE (*op))) > - vec_cst = CONSTM1_RTX (V2DImode); > + vec_cst = CONSTM1_RTX (vmode); > else > - vec_cst = gen_rtx_CONST_VECTOR (V2DImode, > - gen_rtvec (2, *op, const0_rtx)); > + { > + unsigned n = GET_MODE_NUNITS (vmode); > + rtx *v = XALLOCAVEC (rtx, n); > + v[0] = *op; > + for (unsigned i = 1; i < n; ++i) > + v[i] = const0_rtx; > + vec_cst = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (n, v)); > + } > > - if (!standard_sse_constant_p (vec_cst, V2DImode)) > + if (!standard_sse_constant_p (vec_cst, vmode)) > { > start_sequence (); > - vec_cst = validize_mem (force_const_mem (V2DImode, vec_cst)); > + vec_cst = validize_mem (force_const_mem (vmode, vec_cst)); > rtx_insn *seq = get_insns (); > end_sequence (); > emit_insn_before (seq, insn); > @@ -878,14 +951,14 @@ dimode_scalar_chain::convert_op (rtx *op > else > { > gcc_assert (SUBREG_P (*op)); > - gcc_assert (GET_MODE (*op) == V2DImode); > + gcc_assert (GET_MODE (*op) == vmode); > } > } > > /* Convert INSN to vector mode. */ > > void > -dimode_scalar_chain::convert_insn (rtx_insn *insn) > +general_scalar_chain::convert_insn (rtx_insn *insn) > { > rtx def_set = single_set (insn); > rtx src = SET_SRC (def_set); > @@ -896,9 +969,9 @@ dimode_scalar_chain::convert_insn (rtx_i > { > /* There are no scalar integer instructions and therefore > temporary register usage is required. */ > - rtx tmp = gen_reg_rtx (DImode); > + rtx tmp = gen_reg_rtx (smode); > emit_conversion_insns (gen_move_insn (dst, tmp), insn); > - dst = gen_rtx_SUBREG (V2DImode, tmp, 0); > + dst = gen_rtx_SUBREG (vmode, tmp, 0); > } > > switch (GET_CODE (src)) > @@ -907,7 +980,7 @@ dimode_scalar_chain::convert_insn (rtx_i > case ASHIFTRT: > case LSHIFTRT: > convert_op (&XEXP (src, 0), insn); > - PUT_MODE (src, V2DImode); > + PUT_MODE (src, vmode); > break; > > case PLUS: > @@ -915,25 +988,29 @@ dimode_scalar_chain::convert_insn (rtx_i > case IOR: > case XOR: > case AND: > + case SMAX: > + case SMIN: > + case UMAX: > + case UMIN: > convert_op (&XEXP (src, 0), insn); > convert_op (&XEXP (src, 1), insn); > - PUT_MODE (src, V2DImode); > + PUT_MODE (src, vmode); > break; > > case NEG: > src = XEXP (src, 0); > convert_op (&src, insn); > - subreg = gen_reg_rtx (V2DImode); > - emit_insn_before (gen_move_insn (subreg, CONST0_RTX (V2DImode)), insn); > - src = gen_rtx_MINUS (V2DImode, subreg, src); > + subreg = gen_reg_rtx (vmode); > + emit_insn_before (gen_move_insn (subreg, CONST0_RTX (vmode)), insn); > + src = gen_rtx_MINUS (vmode, subreg, src); > break; > > case NOT: > src = XEXP (src, 0); > convert_op (&src, insn); > - subreg = gen_reg_rtx (V2DImode); > - emit_insn_before (gen_move_insn (subreg, CONSTM1_RTX (V2DImode)), > insn); > - src = gen_rtx_XOR (V2DImode, src, subreg); > + subreg = gen_reg_rtx (vmode); > + emit_insn_before (gen_move_insn (subreg, CONSTM1_RTX (vmode)), insn); > + src = gen_rtx_XOR (vmode, src, subreg); > break; > > case MEM: > @@ -947,17 +1024,17 @@ dimode_scalar_chain::convert_insn (rtx_i > break; > > case SUBREG: > - gcc_assert (GET_MODE (src) == V2DImode); > + gcc_assert (GET_MODE (src) == vmode); > break; > > case COMPARE: > src = SUBREG_REG (XEXP (XEXP (src, 0), 0)); > > - gcc_assert ((REG_P (src) && GET_MODE (src) == DImode) > - || (SUBREG_P (src) && GET_MODE (src) == V2DImode)); > + gcc_assert ((REG_P (src) && GET_MODE (src) == GET_MODE_INNER (vmode)) > + || (SUBREG_P (src) && GET_MODE (src) == vmode)); > > if (REG_P (src)) > - subreg = gen_rtx_SUBREG (V2DImode, src, 0); > + subreg = gen_rtx_SUBREG (vmode, src, 0); > else > subreg = copy_rtx_if_shared (src); > emit_insn_before (gen_vec_interleave_lowv2di (copy_rtx_if_shared > (subreg), > @@ -985,7 +1062,9 @@ dimode_scalar_chain::convert_insn (rtx_i > PATTERN (insn) = def_set; > > INSN_CODE (insn) = -1; > - recog_memoized (insn); > + int patt = recog_memoized (insn); > + if (patt == -1) > + fatal_insn_not_found (insn); > df_insn_rescan (insn); > } > > @@ -1124,7 +1203,7 @@ timode_scalar_chain::convert_insn (rtx_i > } > > void > -dimode_scalar_chain::convert_registers () > +general_scalar_chain::convert_registers () > { > bitmap_iterator bi; > unsigned id; > @@ -1194,7 +1273,7 @@ has_non_address_hard_reg (rtx_insn *insn > (const_int 0 [0]))) */ > > static bool > -convertible_comparison_p (rtx_insn *insn) > +convertible_comparison_p (rtx_insn *insn, enum machine_mode mode) > { > if (!TARGET_SSE4_1) > return false; > @@ -1227,12 +1306,12 @@ convertible_comparison_p (rtx_insn *insn > > if (!SUBREG_P (op1) > || !SUBREG_P (op2) > - || GET_MODE (op1) != SImode > - || GET_MODE (op2) != SImode > + || GET_MODE (op1) != mode > + || GET_MODE (op2) != mode > || ((SUBREG_BYTE (op1) != 0 > - || SUBREG_BYTE (op2) != GET_MODE_SIZE (SImode)) > + || SUBREG_BYTE (op2) != GET_MODE_SIZE (mode)) > && (SUBREG_BYTE (op2) != 0 > - || SUBREG_BYTE (op1) != GET_MODE_SIZE (SImode)))) > + || SUBREG_BYTE (op1) != GET_MODE_SIZE (mode)))) > return false; > > op1 = SUBREG_REG (op1); > @@ -1240,7 +1319,7 @@ convertible_comparison_p (rtx_insn *insn > > if (op1 != op2 > || !REG_P (op1) > - || GET_MODE (op1) != DImode) > + || GET_MODE (op1) != GET_MODE_WIDER_MODE (mode).else_blk ()) > return false; > > return true; > @@ -1249,7 +1328,7 @@ convertible_comparison_p (rtx_insn *insn > /* The DImode version of scalar_to_vector_candidate_p. */ > > static bool > -dimode_scalar_to_vector_candidate_p (rtx_insn *insn) > +general_scalar_to_vector_candidate_p (rtx_insn *insn, enum machine_mode mode) > { > rtx def_set = single_set (insn); > > @@ -1263,12 +1342,12 @@ dimode_scalar_to_vector_candidate_p (rtx > rtx dst = SET_DEST (def_set); > > if (GET_CODE (src) == COMPARE) > - return convertible_comparison_p (insn); > + return convertible_comparison_p (insn, mode); > > /* We are interested in DImode promotion only. */ > - if ((GET_MODE (src) != DImode > + if ((GET_MODE (src) != mode > && !CONST_INT_P (src)) > - || GET_MODE (dst) != DImode) > + || GET_MODE (dst) != mode) > return false; > > if (!REG_P (dst) && !MEM_P (dst)) > @@ -1288,6 +1367,15 @@ dimode_scalar_to_vector_candidate_p (rtx > return false; > break; > > + case SMAX: > + case SMIN: > + case UMAX: > + case UMIN: > + if ((mode == DImode && !TARGET_AVX512VL) > + || (mode == SImode && !TARGET_SSE4_1)) > + return false; > + /* Fallthru. */ > + > case PLUS: > case MINUS: > case IOR: > @@ -1298,7 +1386,7 @@ dimode_scalar_to_vector_candidate_p (rtx > && !CONST_INT_P (XEXP (src, 1))) > return false; > > - if (GET_MODE (XEXP (src, 1)) != DImode > + if (GET_MODE (XEXP (src, 1)) != mode > && !CONST_INT_P (XEXP (src, 1))) > return false; > break; > @@ -1327,7 +1415,7 @@ dimode_scalar_to_vector_candidate_p (rtx > || !REG_P (XEXP (XEXP (src, 0), 0)))) > return false; > > - if (GET_MODE (XEXP (src, 0)) != DImode > + if (GET_MODE (XEXP (src, 0)) != mode > && !CONST_INT_P (XEXP (src, 0))) > return false; > > @@ -1391,22 +1479,16 @@ timode_scalar_to_vector_candidate_p (rtx > return false; > } > > -/* Return 1 if INSN may be converted into vector > - instruction. */ > - > -static bool > -scalar_to_vector_candidate_p (rtx_insn *insn) > -{ > - if (TARGET_64BIT) > - return timode_scalar_to_vector_candidate_p (insn); > - else > - return dimode_scalar_to_vector_candidate_p (insn); > -} > +/* For a given bitmap of insn UIDs scans all instruction and > + remove insn from CANDIDATES in case it has both convertible > + and not convertible definitions. > > -/* The DImode version of remove_non_convertible_regs. */ > + All insns in a bitmap are conversion candidates according to > + scalar_to_vector_candidate_p. Currently it implies all insns > + are single_set. */ > > static void > -dimode_remove_non_convertible_regs (bitmap candidates) > +general_remove_non_convertible_regs (bitmap candidates) > { > bitmap_iterator bi; > unsigned id; > @@ -1561,23 +1643,6 @@ timode_remove_non_convertible_regs (bitm > BITMAP_FREE (regs); > } > > -/* For a given bitmap of insn UIDs scans all instruction and > - remove insn from CANDIDATES in case it has both convertible > - and not convertible definitions. > - > - All insns in a bitmap are conversion candidates according to > - scalar_to_vector_candidate_p. Currently it implies all insns > - are single_set. */ > - > -static void > -remove_non_convertible_regs (bitmap candidates) > -{ > - if (TARGET_64BIT) > - timode_remove_non_convertible_regs (candidates); > - else > - dimode_remove_non_convertible_regs (candidates); > -} > - > /* Main STV pass function. Find and convert scalar > instructions into vector mode when profitable. */ > > @@ -1585,11 +1650,14 @@ static unsigned int > convert_scalars_to_vector () > { > basic_block bb; > - bitmap candidates; > int converted_insns = 0; > > bitmap_obstack_initialize (NULL); > - candidates = BITMAP_ALLOC (NULL); > + const machine_mode cand_mode[3] = { SImode, DImode, TImode }; > + const machine_mode cand_vmode[3] = { V4SImode, V2DImode, V1TImode }; > + bitmap_head candidates[3]; /* { SImode, DImode, TImode } */ > + for (unsigned i = 0; i < 3; ++i) > + bitmap_initialize (&candidates[i], &bitmap_default_obstack); > > calculate_dominance_info (CDI_DOMINATORS); > df_set_flags (DF_DEFER_INSN_RESCAN); > @@ -1605,51 +1673,73 @@ convert_scalars_to_vector () > { > rtx_insn *insn; > FOR_BB_INSNS (bb, insn) > - if (scalar_to_vector_candidate_p (insn)) > + if (TARGET_64BIT > + && timode_scalar_to_vector_candidate_p (insn)) > { > if (dump_file) > - fprintf (dump_file, " insn %d is marked as a candidate\n", > + fprintf (dump_file, " insn %d is marked as a TImode > candidate\n", > INSN_UID (insn)); > > - bitmap_set_bit (candidates, INSN_UID (insn)); > + bitmap_set_bit (&candidates[2], INSN_UID (insn)); > + } > + else > + { > + /* Check {SI,DI}mode. */ > + for (unsigned i = 0; i <= 1; ++i) > + if (general_scalar_to_vector_candidate_p (insn, cand_mode[i])) > + { > + if (dump_file) > + fprintf (dump_file, " insn %d is marked as a %s > candidate\n", > + INSN_UID (insn), i == 0 ? "SImode" : "DImode"); > + > + bitmap_set_bit (&candidates[i], INSN_UID (insn)); > + break; > + } > } > } > > - remove_non_convertible_regs (candidates); > + if (TARGET_64BIT) > + timode_remove_non_convertible_regs (&candidates[2]); > + for (unsigned i = 0; i <= 1; ++i) > + general_remove_non_convertible_regs (&candidates[i]); > > - if (bitmap_empty_p (candidates)) > - if (dump_file) > + for (unsigned i = 0; i <= 2; ++i) > + if (!bitmap_empty_p (&candidates[i])) > + break; > + else if (i == 2 && dump_file) > fprintf (dump_file, "There are no candidates for optimization.\n"); > > - while (!bitmap_empty_p (candidates)) > - { > - unsigned uid = bitmap_first_set_bit (candidates); > - scalar_chain *chain; > + for (unsigned i = 0; i <= 2; ++i) > + while (!bitmap_empty_p (&candidates[i])) > + { > + unsigned uid = bitmap_first_set_bit (&candidates[i]); > + scalar_chain *chain; > > - if (TARGET_64BIT) > - chain = new timode_scalar_chain; > - else > - chain = new dimode_scalar_chain; > + if (cand_mode[i] == TImode) > + chain = new timode_scalar_chain; > + else > + chain = new general_scalar_chain (cand_mode[i], cand_vmode[i]); > > - /* Find instructions chain we want to convert to vector mode. > - Check all uses and definitions to estimate all required > - conversions. */ > - chain->build (candidates, uid); > + /* Find instructions chain we want to convert to vector mode. > + Check all uses and definitions to estimate all required > + conversions. */ > + chain->build (&candidates[i], uid); > > - if (chain->compute_convert_gain () > 0) > - converted_insns += chain->convert (); > - else > - if (dump_file) > - fprintf (dump_file, "Chain #%d conversion is not profitable\n", > - chain->chain_id); > + if (chain->compute_convert_gain () > 0) > + converted_insns += chain->convert (); > + else > + if (dump_file) > + fprintf (dump_file, "Chain #%d conversion is not profitable\n", > + chain->chain_id); > > - delete chain; > - } > + delete chain; > + } > > if (dump_file) > fprintf (dump_file, "Total insns converted: %d\n", converted_insns); > > - BITMAP_FREE (candidates); > + for (unsigned i = 0; i <= 2; ++i) > + bitmap_release (&candidates[i]); > bitmap_obstack_release (NULL); > df_process_deferred_rescans (); > > Index: gcc/config/i386/i386-features.h > =================================================================== > --- gcc/config/i386/i386-features.h (revision 274422) > +++ gcc/config/i386/i386-features.h (working copy) > @@ -127,11 +127,16 @@ namespace { > class scalar_chain > { > public: > - scalar_chain (); > + scalar_chain (enum machine_mode, enum machine_mode); > virtual ~scalar_chain (); > > static unsigned max_id; > > + /* Scalar mode. */ > + enum machine_mode smode; > + /* Vector mode. */ > + enum machine_mode vmode; > + > /* ID of a chain. */ > unsigned int chain_id; > /* A queue of instructions to be included into a chain. */ > @@ -159,9 +164,11 @@ class scalar_chain > virtual void convert_registers () = 0; > }; > > -class dimode_scalar_chain : public scalar_chain > +class general_scalar_chain : public scalar_chain > { > public: > + general_scalar_chain (enum machine_mode smode_, enum machine_mode vmode_) > + : scalar_chain (smode_, vmode_) {} > int compute_convert_gain (); > private: > void mark_dual_mode_def (df_ref def); > @@ -178,6 +185,8 @@ class dimode_scalar_chain : public scala > class timode_scalar_chain : public scalar_chain > { > public: > + timode_scalar_chain () : scalar_chain (TImode, V1TImode) {} > + > /* Convert from TImode to V1TImode is always faster. */ > int compute_convert_gain () { return 1; } > > Index: gcc/config/i386/i386.md > =================================================================== > --- gcc/config/i386/i386.md (revision 274422) > +++ gcc/config/i386/i386.md (working copy) > @@ -17719,6 +17719,110 @@ (define_expand "add<mode>cc" > (match_operand:SWI 3 "const_int_operand")] > "" > "if (ix86_expand_int_addcc (operands)) DONE; else FAIL;") > + > +;; min/max patterns > + > +(define_mode_iterator MAXMIN_IMODE > + [(SI "TARGET_SSE4_1") (DI "TARGET_AVX512VL")]) > +(define_code_attr maxmin_rel > + [(smax "GE") (smin "LE") (umax "GEU") (umin "LEU")]) > + > +(define_expand "<code><mode>3" > + [(parallel > + [(set (match_operand:MAXMIN_IMODE 0 "register_operand") > + (maxmin:MAXMIN_IMODE > + (match_operand:MAXMIN_IMODE 1 "register_operand") > + (match_operand:MAXMIN_IMODE 2 "nonimmediate_operand"))) > + (clobber (reg:CC FLAGS_REG))])] > + "TARGET_STV") > + > +(define_insn_and_split "*<code><mode>3_1" > + [(set (match_operand:MAXMIN_IMODE 0 "register_operand") > + (maxmin:MAXMIN_IMODE > + (match_operand:MAXMIN_IMODE 1 "register_operand") > + (match_operand:MAXMIN_IMODE 2 "nonimmediate_operand"))) > + (clobber (reg:CC FLAGS_REG))] > + "(TARGET_64BIT || <MODE>mode != DImode) && TARGET_STV > + && can_create_pseudo_p ()" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (if_then_else:MAXMIN_IMODE (match_dup 3) > + (match_dup 1) > + (match_dup 2)))] > +{ > + machine_mode mode = <MODE>mode; > + > + if (!register_operand (operands[2], mode)) > + operands[2] = force_reg (mode, operands[2]); > + > + enum rtx_code code = <maxmin_rel>; > + machine_mode cmpmode = SELECT_CC_MODE (code, operands[1], operands[2]); > + rtx flags = gen_rtx_REG (cmpmode, FLAGS_REG); > + > + rtx tmp = gen_rtx_COMPARE (cmpmode, operands[1], operands[2]); > + emit_insn (gen_rtx_SET (flags, tmp)); > + > + operands[3] = gen_rtx_fmt_ee (code, VOIDmode, flags, const0_rtx); > +}) > + > +(define_insn_and_split "*<code>di3_doubleword" > + [(set (match_operand:DI 0 "register_operand") > + (maxmin:DI (match_operand:DI 1 "register_operand") > + (match_operand:DI 2 "nonimmediate_operand"))) > + (clobber (reg:CC FLAGS_REG))] > + "!TARGET_64BIT && TARGET_STV && TARGET_AVX512VL > + && can_create_pseudo_p ()" > + "#" > + "&& 1" > + [(set (match_dup 0) > + (if_then_else:SI (match_dup 6) > + (match_dup 1) > + (match_dup 2))) > + (set (match_dup 3) > + (if_then_else:SI (match_dup 6) > + (match_dup 4) > + (match_dup 5)))] > +{ > + if (!register_operand (operands[2], DImode)) > + operands[2] = force_reg (DImode, operands[2]); > + > + split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]); > + > + rtx cmplo[2] = { operands[1], operands[2] }; > + rtx cmphi[2] = { operands[4], operands[5] }; > + > + enum rtx_code code = <maxmin_rel>; > + > + switch (code) > + { > + case LE: case LEU: > + std::swap (cmplo[0], cmplo[1]); > + std::swap (cmphi[0], cmphi[1]); > + code = swap_condition (code); > + /* FALLTHRU */ > + > + case GE: case GEU: > + { > + bool uns = (code == GEU); > + rtx (*sbb_insn) (machine_mode, rtx, rtx, rtx) > + = uns ? gen_sub3_carry_ccc : gen_sub3_carry_ccgz; > + > + emit_insn (gen_cmp_1 (SImode, cmplo[0], cmplo[1])); > + > + rtx tmp = gen_rtx_SCRATCH (SImode); > + emit_insn (sbb_insn (SImode, tmp, cmphi[0], cmphi[1])); > + > + rtx flags = gen_rtx_REG (uns ? CCCmode : CCGZmode, FLAGS_REG); > + operands[6] = gen_rtx_fmt_ee (code, VOIDmode, flags, const0_rtx); > + > + break; > + } > + > + default: > + gcc_unreachable (); > + } > +}) > > ;; Misc patterns (?) > > Index: gcc/testsuite/gcc.target/i386/minmax-1.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/minmax-1.c (revision 274422) > +++ gcc/testsuite/gcc.target/i386/minmax-1.c (working copy) > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -march=opteron" } */ > +/* { dg-options "-O2 -march=opteron -mno-stv" } */ > /* { dg-final { scan-assembler "test" } } */ > /* { dg-final { scan-assembler-not "cmp" } } */ > #define max(a,b) (((a) > (b))? (a) : (b)) > Index: gcc/testsuite/gcc.target/i386/minmax-2.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/minmax-2.c (revision 274422) > +++ gcc/testsuite/gcc.target/i386/minmax-2.c (working copy) > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2" } */ > +/* { dg-options "-O2 -mno-stv" } */ > /* { dg-final { scan-assembler "test" } } */ > /* { dg-final { scan-assembler-not "cmp" } } */ > #define max(a,b) (((a) > (b))? (a) : (b)) > Index: gcc/testsuite/gcc.target/i386/minmax-3.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/minmax-3.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/minmax-3.c (working copy) > @@ -0,0 +1,27 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mstv" } */ > + > +#define max(a,b) (((a) > (b))? (a) : (b)) > +#define min(a,b) (((a) < (b))? (a) : (b)) > + > +int ssi[1024]; > +unsigned int usi[1024]; > +long long sdi[1024]; > +unsigned long long udi[1024]; > + > +#define CHECK(FN, VARIANT) \ > +void \ > +FN ## VARIANT (void) \ > +{ \ > + for (int i = 1; i < 1024; ++i) \ > + VARIANT[i] = FN(VARIANT[i-1], VARIANT[i]); \ > +} > + > +CHECK(max, ssi); > +CHECK(min, ssi); > +CHECK(max, usi); > +CHECK(min, usi); > +CHECK(max, sdi); > +CHECK(min, sdi); > +CHECK(max, udi); > +CHECK(min, udi); > Index: gcc/testsuite/gcc.target/i386/minmax-4.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/minmax-4.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/minmax-4.c (working copy) > @@ -0,0 +1,9 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mstv -msse4.1" } */ > + > +#include "minmax-3.c" > + > +/* { dg-final { scan-assembler-times "pmaxsd" 1 } } */ > +/* { dg-final { scan-assembler-times "pmaxud" 1 } } */ > +/* { dg-final { scan-assembler-times "pminsd" 1 } } */ > +/* { dg-final { scan-assembler-times "pminud" 1 } } */ > Index: gcc/testsuite/gcc.target/i386/minmax-5.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/minmax-5.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/minmax-5.c (working copy) > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -mstv -mavx512vl" } */ > + > +#include "minmax-3.c" > + > +/* { dg-final { scan-assembler-times "vpmaxsd" 1 } } */ > +/* { dg-final { scan-assembler-times "vpmaxud" 1 } } */ > +/* { dg-final { scan-assembler-times "vpminsd" 1 } } */ > +/* { dg-final { scan-assembler-times "vpminud" 1 } } */ > +/* { dg-final { scan-assembler-times "vpmaxsq" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "vpmaxuq" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "vpminsq" 1 { target lp64 } } } */ > +/* { dg-final { scan-assembler-times "vpminuq" 1 { target lp64 } } } */ > Index: gcc/testsuite/gcc.target/i386/minmax-6.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/minmax-6.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/minmax-6.c (working copy) > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -march=haswell" } */ > + > +unsigned short > +UMVLine16Y_11 (short unsigned int * Pic, int y, int width) > +{ > + if (y != width) > + { > + y = y < 0 ? 0 : y; > + return Pic[y * width]; > + } > + return Pic[y]; > +} > + > +/* We do not want the RA to spill %esi for it's dual-use but using > + pmaxsd is OK. */ > +/* { dg-final { scan-assembler-not "rsp" { target { ! { ia32 } } } } } */ > +/* { dg-final { scan-assembler "pmaxsd" } } */ > Index: gcc/testsuite/gcc.target/i386/pr91154.c > =================================================================== > --- gcc/testsuite/gcc.target/i386/pr91154.c (nonexistent) > +++ gcc/testsuite/gcc.target/i386/pr91154.c (working copy) > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -msse4.1 -mstv" } */ > + > +void foo (int *dc, int *mc, int *tpdd, int *tpmd, int M) > +{ > + int sc; > + int k; > + for (k = 1; k <= M; k++) > + { > + dc[k] = dc[k-1] + tpdd[k-1]; > + if ((sc = mc[k-1] + tpmd[k-1]) > dc[k]) dc[k] = sc; > + if (dc[k] < -987654321) dc[k] = -987654321; > + } > +} > + > +/* We want to convert the loop to SSE since SSE pmaxsd is faster than > + compare + conditional move. */ > +/* { dg-final { scan-assembler-not "cmov" } } */ > +/* { dg-final { scan-assembler-times "pmaxsd" 2 } } */ > +/* { dg-final { scan-assembler-times "paddd" 2 } } */