> Note on VLS test changes: When using only -fno-vect-cost-model (without
> -mrvv-vector-bits=zvl), the vectorizer prefers VLA vectorization even for
> these known-trip-count loops under
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/.
> This seems to conflict with the VLS test intent. I added
> -mrvv-vector-bits=zvl
> to force VLS mode, but I'm not sure if this is the preferred approach.
> Should we instead allow VLA selection here, or is -mrvv-vector-bits=zvl
> the right fix?
You could try -mmax-vectorization which is like fno-vect-cost-model but still
compares costs among the candidates.
> + /* Cost of ordered (fold-left / strict) floating-point reductions.
> + These are significantly more expensive than unordered (tree) reductions
> + because RVV ordered reduction instructions (e.g. vfredosum) process
> + elements sequentially. */
I rarely complain about too many comments but throughout the patch they all
read a bit "LLMy" :) Strictly my own opinion, but for example, I don't think
we need to give a reason here why they are more expensive.
> + /* Detect reduction operations and apply type-specific reduction
> + costs. The vec_to_scalar cost kind represents the reduction
> + operation itself (e.g. vredsum.vs, vfredosum.vs), so we replace
> + the default vec_to_scalar_cost with a more precise per-type cost.
> + For floating-point reductions, distinguish between ordered
> + (fold-left, e.g. vfredosum) and unordered (tree, e.g. vfredusum)
> + reductions since ordered reductions are significantly more
> + expensive due to sequential processing. */
> + if (vectype
> + && ((stmt_info && vect_is_reduction (stmt_info))
> + || (node && vect_is_reduction (node))))
> + {
> + const common_vector_cost *common_costs
> + = loop && riscv_vla_mode_p (loop->vector_mode)
> + ? costs->vla : costs->vls;
> +
> + bool is_ordered = false;
> + if (FLOAT_TYPE_P (vectype) && loop && node)
Are both loop && node necessary here?
> + {
> + int reduc_type = vect_reduc_type (m_vinfo, node);
> + is_ordered = (reduc_type == FOLD_LEFT_REDUCTION);
> + }
> +
> + int reduc_cost = 0;
> + switch (GET_MODE_INNER (TYPE_MODE (vectype)))
> + {
> + case E_QImode:
> + reduc_cost = common_costs->reduc_i8_cost;
> + break;
> + case E_HImode:
> + reduc_cost = common_costs->reduc_i16_cost;
> + break;
> + case E_SImode:
> + reduc_cost = common_costs->reduc_i32_cost;
> + break;
> + case E_DImode:
> + reduc_cost = common_costs->reduc_i64_cost;
> + break;
> + case E_HFmode:
> + case E_BFmode:
> + reduc_cost = is_ordered
> + ? common_costs->reduc_f16_ordered_cost
> + : common_costs->reduc_f16_cost;
> + break;
> + case E_SFmode:
> + reduc_cost = is_ordered
> + ? common_costs->reduc_f32_ordered_cost
> + : common_costs->reduc_f32_cost;
> + break;
> + case E_DFmode:
> + reduc_cost = is_ordered
> + ? common_costs->reduc_f64_ordered_cost
> + : common_costs->reduc_f64_cost;
> + break;
> + default:
> + break;
> + }
> +
> + if (reduc_cost)
> + stmt_cost = reduc_cost;
> + }
The function was already a bit bulky before and I think it would be easier to
read if you factored out the reduction cost like:
int reduc_cost = 0;
if (is_reduction ())
reduc_cost = get_reduction_cost (...);
We could/should do the same for other parts (like the lane handling) in the
future.
> +DEF_REDUC_PLUS (_Float16)
> +DEF_REDUC_PLUS (float)
> +DEF_REDUC_PLUS (double)
> +
> +/* Without -ffast-math, FP reductions use ordered (fold-left) mode.
> + With high ordered reduction costs, vectorization may be rejected as
> + unprofitable, but the cost model should still compute and report
> + the correct per-type ordered reduction costs in the vect dump.
> +
> + Verify the ordered reduction cost is reflected in the cost model dump.
> + For ordered reductions: reduc_f*_ordered_cost + vr2fr (2),
> + where reduc_f*_ordered_cost replaces the default vec_to_scalar_cost.
> + f16: reduc_f16_ordered_cost (20) + vr2fr (2) = 22
> + f32: reduc_f32_ordered_cost (10) + vr2fr (2) = 12
> + f64: reduc_f64_ordered_cost (5) + vr2fr (2) = 7 */
Apologies, but this, again, feels slightly too verbose here.
> +/* Verify the ordered reduction cost is reflected in the cost model dump.
> + For ordered reductions: reduc_f*_ordered_cost + vr2fr (2),
> + where reduc_f*_ordered_cost replaces the default vec_to_scalar_cost.
> + f16: reduc_f16_ordered_cost (20) + vr2fr (2) = 22
> + f32: reduc_f32_ordered_cost (10) + vr2fr (2) = 12
> + f64: reduc_f64_ordered_cost (5) + vr2fr (2) = 7
> + Note: ordered FP reductions may not be vectorized with the cost model
> + enabled (cost too high), but the costs are still reported in the dump. */
In particular when it's duplicated :)
--
Regards
Robin