> target_insn_cost is used to prevent rpad optimization to be restored by > late_combine1, looks like it's not sufficient for size_cost. > > 21804static int > 21805ix86_insn_cost (rtx_insn *insn, bool speed) > 21806{ > 21807 int insn_cost = 0; > 21808 /* Add extra cost to avoid post_reload late_combine revert > 21809 the optimization did in pass_rpad. */ > 21810 if (reload_completed > 21811 && ix86_rpad_gate () > 21812 && recog_memoized (insn) >= 0 > 21813 && get_attr_avx_partial_xmm_update (insn) > 21814 == AVX_PARTIAL_XMM_UPDATE_TRUE) > 21815 insn_cost += COSTS_N_INSNS (3); > 21816 > 21817 return insn_cost + pattern_cost (PATTERN (insn), speed); > 21818}
This is well-hidden. I was looking for that in xi86_rtl_costs but did not notice this one. Indeed with the patch instruction seem to cost 8, so 3 is too low. > > And we can use vec_series_lowpart_p instead of only check 1 element. This sadly does not work. vec_series_lowpart_p calls can_cange_mode_class on ALL_REGS and this return false since x87 does not support subregs at all. So it is constant false on x86 targets. This should be fixed too, but incrementally. I added FIXME comment for that and will check if we can restrict the early exit only to modes that we can store in x87 registers. I also had to increase cost to 1. While the argument that these lowparts are essentially equivalent to subregs that cost 0 is OK, we are special casing subregs in many parts of middle-end and we don't handle lowparts that well. Cost of 0 disables CSE and that in turn leads to worse code as described in the comment. I also noticed that costing of VEC_* does not recurse to parameters which makes it cost i.e. memory operand same as register operand. Also something I plan to track incrementally. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: PRPR target/119900 * config/i386/i386.cc (ix86_can_change_mode_class): Add TODO comment. (ix86_rtx_costs): Make VEC_SELECT equivalent to SUBREG cost 1. diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 2f840338138..98c319cbd05 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -20973,7 +20973,11 @@ ix86_can_change_mode_class (machine_mode from, machine_mode to, return true; /* x87 registers can't do subreg at all, as all values are reformatted - to extended precision. */ + to extended precision. + + ??? middle-end queries mode changes for ALL_REGS and this makes + vec_series_lowpart_p to always return false. We probably should + restrict this to modes supported by i387 and check if it is enabled. */ if (MAYBE_FLOAT_CLASS_P (regclass)) return false; @@ -22751,13 +22755,41 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, } return false; - case VEC_SELECT: case VEC_CONCAT: /* ??? Assume all of these vector manipulation patterns are recognizable. In which case they all pretty much have the - same cost. */ + same cost. + ??? We should still recruse when computing cost. */ *total = cost->sse_op; return true; + + case VEC_SELECT: + /* Special case extracting lower part from the vector. + This by itself needs to code and most of SSE/AVX instructions have + packed and single forms where the single form may be represented + by such VEC_SELECT. + + Use cost 1 (despite the fact that functionally equivalent SUBREG has + cost 0). Making VEC_SELECT completely free, for example instructs CSE + to forward propagate VEC_SELECT into + + (set (reg eax) (reg src)) + + which then prevents fwprop and combining. See i.e. + gcc.target/i386/pr91103-1.c. + + ??? rtvec_series_p test should be, for valid patterns, equivalent to + vec_series_lowpart_p but is not, since the latter calls + can_cange_mode_class on ALL_REGS and this return false since x87 does + not support subregs at all. */ + if (rtvec_series_p (XVEC (XEXP (x, 1), 0), 0)) + *total = rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)), + outer_code, opno, speed) + 1; + else + /* ??? We should still recruse when computing cost. */ + *total = cost->sse_op; + return true; + case VEC_DUPLICATE: *total = rtx_cost (XEXP (x, 0), GET_MODE (XEXP (x, 0)), @@ -22775,6 +22807,7 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, if (TARGET_AVX512F && register_operand (mask, GET_MODE (mask))) *total = rtx_cost (XEXP (x, 0), mode, outer_code, opno, speed); else + /* ??? We should still recruse when computing cost. */ *total = cost->sse_op; return true;