On Wed, 23 Apr 2025, Tamar Christina wrote: > Hi All, > > This patch proposes a new vector cost model called "max". The cost model is > an > intersection between two of our existing cost models. Like `unlimited` it > disables the costing vs scalar and assumes all vectorization to be profitable. > > But unlike unlimited it does not fully disable the vector cost model. That > means that we still perform comparisons between vector modes. > > As an example, the following: > > void > foo (char *restrict a, int *restrict b, int *restrict c, > int *restrict d, int stride) > { > if (stride <= 1) > return; > > for (int i = 0; i < 3; i++) > { > int res = c[i]; > int t = b[i * stride]; > if (a[i] != 0) > res = t * d[i]; > c[i] = res; > } > } > > compiled with -O3 -march=armv8-a+sve -fvect-cost-model=dynamic fails to > vectorize as it assumes scalar would be faster, and with > -fvect-cost-model=unlimited it picks a vector type that's so big that the > large > sequence generated is working on mostly inactive lanes: > > ... > and p3.b, p3/z, p4.b, p4.b > whilelo p0.s, wzr, w7 > ld1w z23.s, p3/z, [x3, #3, mul vl] > ld1w z28.s, p0/z, [x5, z31.s, sxtw 2] > add x0, x5, x0 > punpklo p6.h, p6.b > ld1w z27.s, p4/z, [x0, z31.s, sxtw 2] > and p6.b, p6/z, p0.b, p0.b > punpklo p4.h, p7.b > ld1w z24.s, p6/z, [x3, #2, mul vl] > and p4.b, p4/z, p2.b, p2.b > uqdecw w6 > ld1w z26.s, p4/z, [x3] > whilelo p1.s, wzr, w6 > mul z27.s, p5/m, z27.s, z23.s > ld1w z29.s, p1/z, [x4, z31.s, sxtw 2] > punpkhi p7.h, p7.b > mul z24.s, p5/m, z24.s, z28.s > and p7.b, p7/z, p1.b, p1.b > mul z26.s, p5/m, z26.s, z30.s > ld1w z25.s, p7/z, [x3, #1, mul vl] > st1w z27.s, p3, [x2, #3, mul vl] > mul z25.s, p5/m, z25.s, z29.s > st1w z24.s, p6, [x2, #2, mul vl] > st1w z25.s, p7, [x2, #1, mul vl] > st1w z26.s, p4, [x2] > ... > > With -fvect-cost-model=max you get more reasonable code: > > foo: > cmp w4, 1 > ble .L1 > ptrue p7.s, vl3 > index z0.s, #0, w4 > ld1b z29.s, p7/z, [x0] > ld1w z30.s, p7/z, [x1, z0.s, sxtw 2] > ptrue p6.b, all > cmpne p7.b, p7/z, z29.b, #0 > ld1w z31.s, p7/z, [x3] > mul z31.s, p6/m, z31.s, z30.s > st1w z31.s, p7, [x2] > .L1: > ret > > This model has been useful internally for performance exploration and > cost-model > validation. It allows us to force realistic vectorization overriding the cost > model to be able to tell whether it's correct wrt to profitability. > > Bootstrapped Regtested on aarch64-none-linux-gnu, > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > -m32, -m64 and no issues. > > Ok for master?
Hmm. I don't like another cost model. Instead how about changing 'unlimited' to still iterate through vector sizes? Cost modeling is really about vector vs. scalar, not vector vs. vector which is completely under target control. Targets should provide a way to limit iteration, like aarch64 has with the aarch64-autovec-preference --param or x86 has with -mprefer-vector-width. Of course changing 'unlimited' might result in somewhat of a testsuite churn, but still the fix there would be to inject a proper -mXYZ or --param to get the old behavior back (or even consider cycling through the different aarch64-autovec-preference settings for the testsuite). Richard. > Thanks, > Tamar > > gcc/ChangeLog: > > * common.opt (vect-cost-model, simd-cost-model): Add max cost model. > * doc/invoke.texi: Document it. > * flag-types.h (enum vect_cost_model): Add VECT_COST_MODEL_MAX. > * tree-vect-data-refs.cc (vect_peeling_hash_insert, > vect_peeling_hash_choose_best_peeling, > vect_enhance_data_refs_alignment): Use it. > * tree-vect-loop.cc (vect_analyze_loop_costing, > vect_estimate_min_profitable_iters): Likewise. > > --- > diff --git a/gcc/common.opt b/gcc/common.opt > index > 88d987e6ab14d9f8df7aa686efffc43418dbb42d..bd5e2e951f9388b12206d9addc736e336cd0e4ee > 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -3442,11 +3442,11 @@ Enable basic block vectorization (SLP) on trees. > > fvect-cost-model= > Common Joined RejectNegative Enum(vect_cost_model) Var(flag_vect_cost_model) > Init(VECT_COST_MODEL_DEFAULT) Optimization > --fvect-cost-model=[unlimited|dynamic|cheap|very-cheap] Specifies the > cost model for vectorization. > +-fvect-cost-model=[unlimited|max|dynamic|cheap|very-cheap] Specifies the > cost model for vectorization. > > fsimd-cost-model= > Common Joined RejectNegative Enum(vect_cost_model) Var(flag_simd_cost_model) > Init(VECT_COST_MODEL_UNLIMITED) Optimization > --fsimd-cost-model=[unlimited|dynamic|cheap|very-cheap] Specifies the > vectorization cost model for code marked with a simd directive. > +-fsimd-cost-model=[unlimited|max|dynamic|cheap|very-cheap] Specifies the > vectorization cost model for code marked with a simd directive. > > Enum > Name(vect_cost_model) Type(enum vect_cost_model) UnknownError(unknown > vectorizer cost model %qs) > @@ -3454,6 +3454,9 @@ Name(vect_cost_model) Type(enum vect_cost_model) > UnknownError(unknown vectorizer > EnumValue > Enum(vect_cost_model) String(unlimited) Value(VECT_COST_MODEL_UNLIMITED) > > +EnumValue > +Enum(vect_cost_model) String(max) Value(VECT_COST_MODEL_MAX) > + > EnumValue > Enum(vect_cost_model) String(dynamic) Value(VECT_COST_MODEL_DYNAMIC) > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index > 14a78fd236f64185fc129f18b52b20692d49305c..e7b242c9134ff17022c92f81c8b24762cfd59c6c > 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -14449,9 +14449,11 @@ With the @samp{unlimited} model the vectorized > code-path is assumed > to be profitable while with the @samp{dynamic} model a runtime check > guards the vectorized code-path to enable it only for iteration > counts that will likely execute faster than when executing the original > -scalar loop. The @samp{cheap} model disables vectorization of > -loops where doing so would be cost prohibitive for example due to > -required runtime checks for data dependence or alignment but otherwise > +scalar loop. The @samp{max} model similarly to the @samp{unlimited} model > +assumes all vector code is profitable over scalar within loops but does not > +disable the vector to vector costing. The @samp{cheap} model disables > +vectorization of loops where doing so would be cost prohibitive for example > due > +to required runtime checks for data dependence or alignment but otherwise > is equal to the @samp{dynamic} model. The @samp{very-cheap} model disables > vectorization of loops when any runtime check for data dependence or > alignment > is required, it also disables vectorization of epilogue loops but otherwise > is > diff --git a/gcc/flag-types.h b/gcc/flag-types.h > index > db573768c23d9f6809ae115e71370960314f16ce..1c941c295a2e608eae58c3e3fb0eba1284f731ca > 100644 > --- a/gcc/flag-types.h > +++ b/gcc/flag-types.h > @@ -277,9 +277,10 @@ enum scalar_storage_order_kind { > /* Vectorizer cost-model. Except for DEFAULT, the values are ordered from > the most conservative to the least conservative. */ > enum vect_cost_model { > - VECT_COST_MODEL_VERY_CHEAP = -3, > - VECT_COST_MODEL_CHEAP = -2, > - VECT_COST_MODEL_DYNAMIC = -1, > + VECT_COST_MODEL_VERY_CHEAP = -4, > + VECT_COST_MODEL_CHEAP = -3, > + VECT_COST_MODEL_DYNAMIC = -2, > + VECT_COST_MODEL_MAX = -1, > VECT_COST_MODEL_UNLIMITED = 0, > VECT_COST_MODEL_DEFAULT = 1 > }; > diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc > index > c9395e33fcdfc7deedd979c764daae93b15abace..5c56956c2edcb76210c36b60526f031011c8e0c7 > 100644 > --- a/gcc/tree-vect-data-refs.cc > +++ b/gcc/tree-vect-data-refs.cc > @@ -1847,7 +1847,9 @@ vect_peeling_hash_insert (hash_table<peel_info_hasher> > *peeling_htab, > /* If this DR is not supported with unknown misalignment then bias > this slot when the cost model is disabled. */ > if (!supportable_if_not_aligned > - && unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > + && (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) > + || loop_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) > + == VECT_COST_MODEL_MAX)) > slot->count += VECT_MAX_COST; > } > > @@ -2002,7 +2004,8 @@ vect_peeling_hash_choose_best_peeling > (hash_table<peel_info_hasher> *peeling_hta > res.peel_info.dr_info = NULL; > res.vinfo = loop_vinfo; > > - if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > + if (!unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) > + && loop_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) != > VECT_COST_MODEL_MAX) > { > res.inside_cost = INT_MAX; > res.outside_cost = INT_MAX; > @@ -2348,7 +2351,8 @@ vect_enhance_data_refs_alignment (loop_vec_info > loop_vinfo) > We do this automatically for cost model, since we calculate > cost for every peeling option. */ > poly_uint64 nscalars = npeel_tmp; > - if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > + if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) > + || loop_cost_model (LOOP_VINFO_LOOP (loop_vinfo)) == > VECT_COST_MODEL_MAX) > { > poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > unsigned group_size = 1; > diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc > index > 958b829fa8d1ad267fbde3be915719f3a51e6a38..5f3adc257f6581850f901c7747771f5931df942a > 100644 > --- a/gcc/tree-vect-loop.cc > +++ b/gcc/tree-vect-loop.cc > @@ -2407,7 +2407,8 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo, > &min_profitable_estimate, > suggested_unroll_factor); > > - if (min_profitable_iters < 0) > + if (min_profitable_iters < 0 > + && loop_cost_model (loop) != VECT_COST_MODEL_MAX) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -2430,7 +2431,8 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo, > LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th; > > if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) > - && LOOP_VINFO_INT_NITERS (loop_vinfo) < th) > + && LOOP_VINFO_INT_NITERS (loop_vinfo) < th > + && loop_cost_model (loop) != VECT_COST_MODEL_MAX) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > @@ -2490,6 +2492,7 @@ vect_analyze_loop_costing (loop_vec_info loop_vinfo, > estimated_niter = likely_max_stmt_executions_int (loop); > } > if (estimated_niter != -1 > + && loop_cost_model (loop) != VECT_COST_MODEL_MAX > && ((unsigned HOST_WIDE_INT) estimated_niter > < MAX (th, (unsigned) min_profitable_estimate))) > { > @@ -4638,7 +4641,7 @@ vect_estimate_min_profitable_iters (loop_vec_info > loop_vinfo, > vector_costs *target_cost_data = loop_vinfo->vector_costs; > > /* Cost model disabled. */ > - if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > + if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo))) > { > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n"); > > > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)