> -----Original Message-----
> From: Richard Sandiford <richard.sandif...@arm.com>
> Sent: Wednesday, April 23, 2025 9:45 AM
> To: Tamar Christina <tamar.christ...@arm.com>
> Cc: Richard Biener <rguent...@suse.de>; gcc-patches@gcc.gnu.org; nd
> <n...@arm.com>
> Subject: Re: [PATCH]middle-end: Add new "max" vector cost model
> 
> Tamar Christina <tamar.christ...@arm.com> writes:
> >> -----Original Message-----
> >> From: Richard Biener <rguent...@suse.de>
> >> Sent: Wednesday, April 23, 2025 9:31 AM
> >> To: Tamar Christina <tamar.christ...@arm.com>
> >> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Sandiford
> >> <richard.sandif...@arm.com>
> >> Subject: Re: [PATCH]middle-end: Add new "max" vector cost model
> >>
> >> On Wed, 23 Apr 2025, Tamar Christina wrote:
> >>
> >> > Hi All,
> >> >
> >> > This patch proposes a new vector cost model called "max".  The cost 
> >> > model is
> an
> >> > intersection between two of our existing cost models.  Like `unlimited` 
> >> > it
> >> > disables the costing vs scalar and assumes all vectorization to be 
> >> > profitable.
> >> >
> >> > But unlike unlimited it does not fully disable the vector cost model.  
> >> > That
> >> > means that we still perform comparisons between vector modes.
> >> >
> >> > As an example, the following:
> >> >
> >> > void
> >> > foo (char *restrict a, int *restrict b, int *restrict c,
> >> >      int *restrict d, int stride)
> >> > {
> >> >     if (stride <= 1)
> >> >         return;
> >> >
> >> >     for (int i = 0; i < 3; i++)
> >> >         {
> >> >             int res = c[i];
> >> >             int t = b[i * stride];
> >> >             if (a[i] != 0)
> >> >                 res = t * d[i];
> >> >             c[i] = res;
> >> >         }
> >> > }
> >> >
> >> > compiled with -O3 -march=armv8-a+sve -fvect-cost-model=dynamic fails to
> >> > vectorize as it assumes scalar would be faster, and with
> >> > -fvect-cost-model=unlimited it picks a vector type that's so big that 
> >> > the large
> >> > sequence generated is working on mostly inactive lanes:
> >> >
> >> >         ...
> >> >         and     p3.b, p3/z, p4.b, p4.b
> >> >         whilelo p0.s, wzr, w7
> >> >         ld1w    z23.s, p3/z, [x3, #3, mul vl]
> >> >         ld1w    z28.s, p0/z, [x5, z31.s, sxtw 2]
> >> >         add     x0, x5, x0
> >> >         punpklo p6.h, p6.b
> >> >         ld1w    z27.s, p4/z, [x0, z31.s, sxtw 2]
> >> >         and     p6.b, p6/z, p0.b, p0.b
> >> >         punpklo p4.h, p7.b
> >> >         ld1w    z24.s, p6/z, [x3, #2, mul vl]
> >> >         and     p4.b, p4/z, p2.b, p2.b
> >> >         uqdecw  w6
> >> >         ld1w    z26.s, p4/z, [x3]
> >> >         whilelo p1.s, wzr, w6
> >> >         mul     z27.s, p5/m, z27.s, z23.s
> >> >         ld1w    z29.s, p1/z, [x4, z31.s, sxtw 2]
> >> >         punpkhi p7.h, p7.b
> >> >         mul     z24.s, p5/m, z24.s, z28.s
> >> >         and     p7.b, p7/z, p1.b, p1.b
> >> >         mul     z26.s, p5/m, z26.s, z30.s
> >> >         ld1w    z25.s, p7/z, [x3, #1, mul vl]
> >> >         st1w    z27.s, p3, [x2, #3, mul vl]
> >> >         mul     z25.s, p5/m, z25.s, z29.s
> >> >         st1w    z24.s, p6, [x2, #2, mul vl]
> >> >         st1w    z25.s, p7, [x2, #1, mul vl]
> >> >         st1w    z26.s, p4, [x2]
> >> >         ...
> >> >
> >> > With -fvect-cost-model=max you get more reasonable code:
> >> >
> >> > foo:
> >> >         cmp     w4, 1
> >> >         ble     .L1
> >> >         ptrue   p7.s, vl3
> >> >         index   z0.s, #0, w4
> >> >         ld1b    z29.s, p7/z, [x0]
> >> >         ld1w    z30.s, p7/z, [x1, z0.s, sxtw 2]
> >> >  ptrue   p6.b, all
> >> >         cmpne   p7.b, p7/z, z29.b, #0
> >> >         ld1w    z31.s, p7/z, [x3]
> >> >  mul     z31.s, p6/m, z31.s, z30.s
> >> >         st1w    z31.s, p7, [x2]
> >> > .L1:
> >> >         ret
> >> >
> >> > This model has been useful internally for performance exploration and 
> >> > cost-
> >> model
> >> > validation.  It allows us to force realistic vectorization overriding 
> >> > the cost
> >> > model to be able to tell whether it's correct wrt to profitability.
> >> >
> >> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> >> > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu
> >> > -m32, -m64 and no issues.
> >> >
> >> > Ok for master?
> >>
> >> Hmm.  I don't like another cost model.  Instead how about changing
> >> 'unlimited' to still iterate through vector sizes?  Cost modeling
> >> is really about vector vs. scalar, not vector vs. vector which is
> >> completely under target control.  Targets should provide a way
> >> to limit iteration, like aarch64 has with the aarch64-autovec-preference
> >> --param or x86 has with -mprefer-vector-width.
> >>
> >
> > I'm ok with changing 'unlimited' if that's preferred, but I do want to point
> > out that we don't have enough control with current --param or -m options
> > to simulate all cases.
> >
> > For instance for SVE there's way for us to force a smaller type to be used
> > and thus force an unpacking to happen.  Or there's no way to force an
> > unrolling with Adv. SIMD.
> >
> > Basically there's not enough control over the VF to exercise some tests
> > reliably.  Some tests explicitly relied on unlimited just picking the first
> > mode.
> 
> FWIW, adding extra AArch64 --params sounds ok to me.  The ones we have
> were just added on an as-needed/as-wanted basis, rather than as an attempt
> to be complete.
> 
> After the aarch64-autovec-preference backward-compatibility controversy,
> we should consider whether what we add is something that is intended for
> developers and can be taken away at any time (--param), or whether it's
> something that we promise to support going forward (-m).

Fair enough, the target doesn't have enough control over the vector costing
strategy here.  So I guess the implementation for this would be to raise scalar
costing to an extreme degree such that 'dynamic' will always vectorize and
still do the inner vector mode comparisons?

Still I'm hoping we could do something generic here as I think it's useful for
everyone but will prepare a backend patch if this isn't the case.

If we do, my vote would be for a `-m` option as the use cases for this would
be used in infrastructures for a long time.

Thanks,
Tamar

> 
> Thanks,
> Richard

Reply via email to