On Wed, 23 Apr 2025, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener <rguent...@suse.de> > > Sent: Wednesday, April 23, 2025 10:14 AM > > To: Tamar Christina <tamar.christ...@arm.com> > > Cc: Richard Sandiford <richard.sandif...@arm.com>; gcc-patches@gcc.gnu.org; > > nd <n...@arm.com> > > Subject: RE: [PATCH]middle-end: Add new "max" vector cost model > > > > On Wed, 23 Apr 2025, Tamar Christina wrote: > > > > > > -----Original Message----- > > > > From: Richard Sandiford <richard.sandif...@arm.com> > > > > Sent: Wednesday, April 23, 2025 9:45 AM > > > > To: Tamar Christina <tamar.christ...@arm.com> > > > > Cc: Richard Biener <rguent...@suse.de>; gcc-patches@gcc.gnu.org; nd > > > > <n...@arm.com> > > > > Subject: Re: [PATCH]middle-end: Add new "max" vector cost model > > > > > > > > Tamar Christina <tamar.christ...@arm.com> writes: > > > > >> -----Original Message----- > > > > >> From: Richard Biener <rguent...@suse.de> > > > > >> Sent: Wednesday, April 23, 2025 9:31 AM > > > > >> To: Tamar Christina <tamar.christ...@arm.com> > > > > >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Sandiford > > > > >> <richard.sandif...@arm.com> > > > > >> Subject: Re: [PATCH]middle-end: Add new "max" vector cost model > > > > >> > > > > >> On Wed, 23 Apr 2025, Tamar Christina wrote: > > > > >> > > > > >> > Hi All, > > > > >> > > > > > >> > This patch proposes a new vector cost model called "max". The > > > > >> > cost model > > is > > > > an > > > > >> > intersection between two of our existing cost models. Like > > > > >> > `unlimited` it > > > > >> > disables the costing vs scalar and assumes all vectorization to be > > > > >> > profitable. > > > > >> > > > > > >> > But unlike unlimited it does not fully disable the vector cost > > > > >> > model. That > > > > >> > means that we still perform comparisons between vector modes. > > > > >> > > > > > >> > As an example, the following: > > > > >> > > > > > >> > void > > > > >> > foo (char *restrict a, int *restrict b, int *restrict c, > > > > >> > int *restrict d, int stride) > > > > >> > { > > > > >> > if (stride <= 1) > > > > >> > return; > > > > >> > > > > > >> > for (int i = 0; i < 3; i++) > > > > >> > { > > > > >> > int res = c[i]; > > > > >> > int t = b[i * stride]; > > > > >> > if (a[i] != 0) > > > > >> > res = t * d[i]; > > > > >> > c[i] = res; > > > > >> > } > > > > >> > } > > > > >> > > > > > >> > compiled with -O3 -march=armv8-a+sve -fvect-cost-model=dynamic > > > > >> > fails > > to > > > > >> > vectorize as it assumes scalar would be faster, and with > > > > >> > -fvect-cost-model=unlimited it picks a vector type that's so big > > > > >> > that the > > large > > > > >> > sequence generated is working on mostly inactive lanes: > > > > >> > > > > > >> > ... > > > > >> > and p3.b, p3/z, p4.b, p4.b > > > > >> > whilelo p0.s, wzr, w7 > > > > >> > ld1w z23.s, p3/z, [x3, #3, mul vl] > > > > >> > ld1w z28.s, p0/z, [x5, z31.s, sxtw 2] > > > > >> > add x0, x5, x0 > > > > >> > punpklo p6.h, p6.b > > > > >> > ld1w z27.s, p4/z, [x0, z31.s, sxtw 2] > > > > >> > and p6.b, p6/z, p0.b, p0.b > > > > >> > punpklo p4.h, p7.b > > > > >> > ld1w z24.s, p6/z, [x3, #2, mul vl] > > > > >> > and p4.b, p4/z, p2.b, p2.b > > > > >> > uqdecw w6 > > > > >> > ld1w z26.s, p4/z, [x3] > > > > >> > whilelo p1.s, wzr, w6 > > > > >> > mul z27.s, p5/m, z27.s, z23.s > > > > >> > ld1w z29.s, p1/z, [x4, z31.s, sxtw 2] > > > > >> > punpkhi p7.h, p7.b > > > > >> > mul z24.s, p5/m, z24.s, z28.s > > > > >> > and p7.b, p7/z, p1.b, p1.b > > > > >> > mul z26.s, p5/m, z26.s, z30.s > > > > >> > ld1w z25.s, p7/z, [x3, #1, mul vl] > > > > >> > st1w z27.s, p3, [x2, #3, mul vl] > > > > >> > mul z25.s, p5/m, z25.s, z29.s > > > > >> > st1w z24.s, p6, [x2, #2, mul vl] > > > > >> > st1w z25.s, p7, [x2, #1, mul vl] > > > > >> > st1w z26.s, p4, [x2] > > > > >> > ... > > > > >> > > > > > >> > With -fvect-cost-model=max you get more reasonable code: > > > > >> > > > > > >> > foo: > > > > >> > cmp w4, 1 > > > > >> > ble .L1 > > > > >> > ptrue p7.s, vl3 > > > > >> > index z0.s, #0, w4 > > > > >> > ld1b z29.s, p7/z, [x0] > > > > >> > ld1w z30.s, p7/z, [x1, z0.s, sxtw 2] > > > > >> > ptrue p6.b, all > > > > >> > cmpne p7.b, p7/z, z29.b, #0 > > > > >> > ld1w z31.s, p7/z, [x3] > > > > >> > mul z31.s, p6/m, z31.s, z30.s > > > > >> > st1w z31.s, p7, [x2] > > > > >> > .L1: > > > > >> > ret > > > > >> > > > > > >> > This model has been useful internally for performance exploration > > > > >> > and > > cost- > > > > >> model > > > > >> > validation. It allows us to force realistic vectorization > > > > >> > overriding the cost > > > > >> > model to be able to tell whether it's correct wrt to profitability. > > > > >> > > > > > >> > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > > >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > > >> > -m32, -m64 and no issues. > > > > >> > > > > > >> > Ok for master? > > > > >> > > > > >> Hmm. I don't like another cost model. Instead how about changing > > > > >> 'unlimited' to still iterate through vector sizes? Cost modeling > > > > >> is really about vector vs. scalar, not vector vs. vector which is > > > > >> completely under target control. Targets should provide a way > > > > >> to limit iteration, like aarch64 has with the > > > > >> aarch64-autovec-preference > > > > >> --param or x86 has with -mprefer-vector-width. > > > > >> > > > > > > > > > > I'm ok with changing 'unlimited' if that's preferred, but I do want > > > > > to point > > > > > out that we don't have enough control with current --param or -m > > > > > options > > > > > to simulate all cases. > > > > > > > > > > For instance for SVE there's way for us to force a smaller type to be > > > > > used > > > > > and thus force an unpacking to happen. Or there's no way to force an > > > > > unrolling with Adv. SIMD. > > > > > > > > > > Basically there's not enough control over the VF to exercise some > > > > > tests > > > > > reliably. Some tests explicitly relied on unlimited just picking the > > > > > first > > > > > mode. > > > > > > > > FWIW, adding extra AArch64 --params sounds ok to me. The ones we have > > > > were just added on an as-needed/as-wanted basis, rather than as an > > > > attempt > > > > to be complete. > > > > > > > > After the aarch64-autovec-preference backward-compatibility controversy, > > > > we should consider whether what we add is something that is intended for > > > > developers and can be taken away at any time (--param), or whether it's > > > > something that we promise to support going forward (-m). > > > > > > Fair enough, the target doesn't have enough control over the vector > > > costing > > > strategy here. So I guess the implementation for this would be to raise > > > scalar > > > costing to an extreme degree such that 'dynamic' will always vectorize and > > > still do the inner vector mode comparisons? > > > > I'd think a --param to "scale" the scalar (or vector) cost so one could > > gradually get "more" vectorization might sound useful. > > > > That said, we do want to get the chance to look at the cases where > > vector costs are larger than scalar costs - for x86 it's most of the > > time cases that show "fixes" we put in for the lack of proper modeling > > a CPU pipeline bite back in some cases. > > Yeah, that's one of the reason for this patch :) > > So joining the three threads, it sounds like the following conclusion: > > - no new cost model > - provide parameter to scale scalar costing (I'd scale scalar costing up > rather > than vector down, otherwise the fractional costing can become iffy if the > values collapse to 0.) > - provide a parameter to select VF or mode > > The only remaining question now is, whether these should be backend > parameters or generic parameters usable for all targets.
The cost scaling could be generic, the iteration thing is necessarily target specific. Richard. > Thanks, > Tamar > > > > > Richard. > > > > > Still I'm hoping we could do something generic here as I think it's > > > useful for > > > everyone but will prepare a backend patch if this isn't the case. > > > > > > If we do, my vote would be for a `-m` option as the use cases for this > > > would > > > be used in infrastructures for a long time. > > > > > > Thanks, > > > Tamar > > > > > > > > > > > Thanks, > > > > Richard > > > > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)