> Note on VLS test changes: When using only -fno-vect-cost-model (without
> -mrvv-vector-bits=zvl), the vectorizer prefers VLA vectorization even for
> these known-trip-count loops under 
> gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/.
> This seems to conflict with the VLS test intent.  I added 
> -mrvv-vector-bits=zvl
> to force VLS mode, but I'm not sure if this is the preferred approach.
> Should we instead allow VLA selection here, or is -mrvv-vector-bits=zvl
> the right fix?

You could try -mmax-vectorization which is like fno-vect-cost-model but still 
compares costs among the candidates.

> +  /* Cost of ordered (fold-left / strict) floating-point reductions.
> +     These are significantly more expensive than unordered (tree) reductions
> +     because RVV ordered reduction instructions (e.g. vfredosum) process
> +     elements sequentially.  */

I rarely complain about too many comments but throughout the patch they all 
read a bit "LLMy" :)  Strictly my own opinion, but for example, I don't think 
we need to give a reason here why they are more expensive.

> +     /* Detect reduction operations and apply type-specific reduction
> +        costs.  The vec_to_scalar cost kind represents the reduction
> +        operation itself (e.g. vredsum.vs, vfredosum.vs), so we replace
> +        the default vec_to_scalar_cost with a more precise per-type cost.
> +        For floating-point reductions, distinguish between ordered
> +        (fold-left, e.g. vfredosum) and unordered (tree, e.g. vfredusum)
> +        reductions since ordered reductions are significantly more
> +        expensive due to sequential processing.  */
> +     if (vectype
> +         && ((stmt_info && vect_is_reduction (stmt_info))
> +             || (node && vect_is_reduction (node))))
> +       {
> +         const common_vector_cost *common_costs
> +           = loop && riscv_vla_mode_p (loop->vector_mode)
> +             ? costs->vla : costs->vls;
> +
> +         bool is_ordered = false;
> +         if (FLOAT_TYPE_P (vectype) && loop && node)

Are both loop && node necessary here?

> +           {
> +             int reduc_type = vect_reduc_type (m_vinfo, node);
> +             is_ordered = (reduc_type == FOLD_LEFT_REDUCTION);
> +           }
> +
> +         int reduc_cost = 0;
> +         switch (GET_MODE_INNER (TYPE_MODE (vectype)))
> +           {
> +           case E_QImode:
> +             reduc_cost = common_costs->reduc_i8_cost;
> +             break;
> +           case E_HImode:
> +             reduc_cost = common_costs->reduc_i16_cost;
> +             break;
> +           case E_SImode:
> +             reduc_cost = common_costs->reduc_i32_cost;
> +             break;
> +           case E_DImode:
> +             reduc_cost = common_costs->reduc_i64_cost;
> +             break;
> +           case E_HFmode:
> +           case E_BFmode:
> +             reduc_cost = is_ordered
> +                          ? common_costs->reduc_f16_ordered_cost
> +                          : common_costs->reduc_f16_cost;
> +             break;
> +           case E_SFmode:
> +             reduc_cost = is_ordered
> +                          ? common_costs->reduc_f32_ordered_cost
> +                          : common_costs->reduc_f32_cost;
> +             break;
> +           case E_DFmode:
> +             reduc_cost = is_ordered
> +                          ? common_costs->reduc_f64_ordered_cost
> +                          : common_costs->reduc_f64_cost;
> +             break;
> +           default:
> +             break;
> +           }
> +
> +         if (reduc_cost)
> +           stmt_cost = reduc_cost;
> +       }

The function was already a bit bulky before and I think it would be easier to 
read if you factored out the reduction cost like:
  int reduc_cost = 0;
  if (is_reduction ())
    reduc_cost = get_reduction_cost (...);

We could/should do the same for other parts (like the lane handling) in the 
future.

> +DEF_REDUC_PLUS (_Float16)
> +DEF_REDUC_PLUS (float)
> +DEF_REDUC_PLUS (double)
> +
> +/* Without -ffast-math, FP reductions use ordered (fold-left) mode.
> +   With high ordered reduction costs, vectorization may be rejected as
> +   unprofitable, but the cost model should still compute and report
> +   the correct per-type ordered reduction costs in the vect dump.
> +
> +   Verify the ordered reduction cost is reflected in the cost model dump.
> +   For ordered reductions: reduc_f*_ordered_cost + vr2fr (2),
> +   where reduc_f*_ordered_cost replaces the default vec_to_scalar_cost.
> +   f16: reduc_f16_ordered_cost (20) + vr2fr (2) = 22
> +   f32: reduc_f32_ordered_cost (10) + vr2fr (2) = 12
> +   f64: reduc_f64_ordered_cost (5)  + vr2fr (2) = 7  */

Apologies, but this, again, feels slightly too verbose here.

> +/* Verify the ordered reduction cost is reflected in the cost model dump.
> +   For ordered reductions: reduc_f*_ordered_cost + vr2fr (2),
> +   where reduc_f*_ordered_cost replaces the default vec_to_scalar_cost.
> +   f16: reduc_f16_ordered_cost (20) + vr2fr (2) = 22
> +   f32: reduc_f32_ordered_cost (10) + vr2fr (2) = 12
> +   f64: reduc_f64_ordered_cost (5)  + vr2fr (2) = 7
> +   Note: ordered FP reductions may not be vectorized with the cost model
> +   enabled (cost too high), but the costs are still reported in the dump.  */

In particular when it's duplicated :)

-- 
Regards
 Robin

Reply via email to