I see, let the vec_dup enter the rtx_cost again to append the total to vmv, I have a try testing. For example with below change:
+ switch (rcode) + { + case VEC_DUPLICATE: + *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1); + break; + case PLUS: + { + rtx op_0 = XEXP (x, 0); + rtx op_1 = XEXP (x, 1); + if (GET_CODE (op_0) == VEC_DUPLICATE + || GET_CODE (op_1) == VEC_DUPLICATE) + *total += get_vector_costs ()->regmove->GR2VR * COSTS_N_INSNS (1); + else + *total = COSTS_N_INSNS (1); + break; + } + default: + *total = COSTS_N_INSNS (1); + break; + } + + return true; For case_0, GR2VR is 0, we will have late-combine as blow: 51 │ trying to combine definition of r135 in: 52 │ 11: r135:RVVM1SI=vec_duplicate(r150:DI#0) 53 │ into: 54 │ 18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI 55 │ REG_DEAD r146:RVVM1SI 56 │ successfully matched this instruction to *add_vx_rvvm1si: 57 │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ]) 58 │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 [ x ]) 0)) 59 │ (reg:RVVM1SI 146))) 60 │ original cost = 8 + 4 (weighted: 39.483637), replacement cost = 8 (weighted: 64.727273); rejecting replacement For case_0, GR2VR is 1, we will have late-combine as blow: 51 │ trying to combine definition of r135 in: 52 │ 11: r135:RVVM1SI=vec_duplicate(r150:DI#0) 53 │ into: 54 │ 18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI 55 │ REG_DEAD r146:RVVM1SI 56 │ successfully matched this instruction to *add_vx_rvvm1si: 57 │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ]) 58 │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 [ x ]) 0)) 59 │ (reg:RVVM1SI 146))) 60 │ original cost = 12 + 4 (weighted: 43.043637), replacement cost = 12 (weighted: 97.090910); rejecting replacement For case_0, GR2VR is 2, we will have late-combine as blow: 51 │ trying to combine definition of r135 in: 52 │ 11: r135:RVVM1SI=vec_duplicate(r150:DI#0) 53 │ into: 54 │ 18: r147:RVVM1SI=r146:RVVM1SI+r135:RVVM1SI 55 │ REG_DEAD r146:RVVM1SI 56 │ successfully matched this instruction to *add_vx_rvvm1si: 57 │ (set (reg:RVVM1SI 147 [ vect__6.8_16 ]) 58 │ (plus:RVVM1SI (vec_duplicate:RVVM1SI (subreg/s/u:SI (reg:DI 150 [ x ]) 0)) 59 │ (reg:RVVM1SI 146))) 60 │ original cost = 16 + 4 (weighted: 46.603637), replacement cost = 16 (weighted: 129.454546); rejecting replacement The vadd v, vec_dup(x) seems has the same cost as vec_dup here. I am also confused about the how we calculate the vadd v, vec_dup(x), can we just set its' cost to vadd.vx? given we have define_insn_and_split to match the pattern and emit the vadd.vx directly. And it matches the expr we mentioned vadd.vv + vec == vadd.vx. Please help to correct me if misunderstanding. ------------- cut line ------------- > Could you give a short example where that doesn't work so I can have a quick > look myself? Sorry for misleading, it is not something doesn't work, just would like to make it clean before next step as it is quit complicate up to a point. However, I use below example for testing, case_0 for only vadd.vv in loop, while case_1 for vmv and vadd.vv in loop. 1 │ #include <stdint.h> 2 │ 3 │ #define T int32_t 4 │ #define OP + 5 │ 6 │ void 7 │ test_vx_binary_case_0 (T * restrict out, T * restrict in, T x, unsigned n) 8 │ { 9 │ for (unsigned i = 0; i < n; i++) 10 │ out[i] = in[i] OP x; 11 │ } 12 │ 13 │ void 14 │ test_vx_binary_case_1 (T * restrict out, T * restrict a, T b, unsigned n) 15 │ { 16 │ unsigned k = 0; 17 │ 18 │ T xb = b + 3; 19 │ 20 │ while (k < n) { 21 │ out[k + 0] = a[k + 0] OP xb; 22 │ out[k + 1] = a[k + 1] OP xb; 23 │ out[k + 2] = a[k + 2] OP xb; 24 │ out[k + 3] = a[k + 3] OP xb; 25 │ k += 4; 26 │ 27 │ xb = xb ^ 0x3f; 28 │ 29 │ out[k + 0] = a[k + 0] OP xb; 30 │ out[k + 1] = a[k + 1] OP xb; 31 │ out[k + 2] = a[k + 2] OP xb; 32 │ out[k + 3] = a[k + 3] OP xb; 33 │ k += 4; 34 │ } 35 │ } Pan -----Original Message----- From: Robin Dapp <rdapp....@gmail.com> Sent: Tuesday, April 29, 2025 2:31 PM To: Li, Pan2 <pan2...@intel.com>; Robin Dapp <rdapp....@gmail.com>; gcc-patches@gcc.gnu.org Cc: juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; Chen, Ken <ken.c...@intel.com>; Liu, Hongtao <hongtao....@intel.com>; Robin Dapp <rdapp....@gmail.com> Subject: Re: [PATCH v2 1/3] RISC-V: Combine vec_duplicate + vadd.vv to vadd.vx on GR2VR cost > But this is not that good enough here if my understanding is correct. > As vmv.v.x is somehow equivalent to vec_dup but doesn't ref GR2VR, But it should. Can't we do something like: if (riscv_v_ext_mode_p (mode)) { switch (GET_CODE (x)) { case VEC_DUPLICATE: // will add 0, 4, 8 in case of GR2VR = 0, 1, 2 *total += GR2VR * COST_N_INSNS (1); break; case PLUS: rtx op2 = XEXP (XEXP (x, 1), 0); if (GET_CODE (op2) == VEC_DUPLICATE) *total += GR2VR * COST_N_INSNS (1); ... break; Could you give a short example where that doesn't work so I can have a quick look myself? Thanks. -- Regards Robin