On Fri, 8 May 2026, Tamar Christina wrote: > > -----Original Message----- > > From: Richard Biener <[email protected]> > > Sent: 08 May 2026 08:13 > > To: Tamar Christina <[email protected]> > > Cc: [email protected]; nd <[email protected]> > > Subject: Re: [PATCH 2/2] scev: maintain affine CHRECs in the presence of > > type > > conversions > > > > On Thu, 7 May 2026, Tamar Christina wrote: > > > > > The example > > > > > > float *e; > > > void f (float *f, float *g, char *h, int n, > > > int b, int c, int d) > > > { > > > float a = 0; > > > for (int i = 0; i < n; ++i) { > > > int j = b + i, k = c + i * d; > > > float l = g[j], m = h[i] ? g[k] : l; > > > a += f[i] * m; > > > } > > > *e = a; > > > } > > > > > > gets vectorized using gathers for the access to g: > > > > > > .L5: > > > ld1b z4.s, p7/z, [x2, x6] > > > cmpne p6.b, p7/z, z4.b, #0 > > > ld1w z2.s, p7/z, [x0, x6, lsl 2] > > > add z7.s, z30.s, z16.s > > > add z6.s, z16.s, z18.s > > > add x6, x6, x7 > > > ld1w z5.s, p7/z, [x1, z6.s, sxtw 2] > > > ld1w z3.s, p6/z, [x1, z7.s, sxtw 2] > > > incw z16.s > > > sel z3.s, p6, z3.s, z5.s > > > fmla z17.s, p7/m, z2.s, z3.s > > > whilelo p7.s, w6, w3 > > > b.any .L5 > > > > > > however the first g is g[b+i] and second is g[c + i*d]; > > > > > > since b is loop invariant the access to g[b+i] is actually linear and > > > since c > > > is loop invariant, then the base of the second access g[c + i *d] can be > > > simplified by recognizing the base as g + c. > > > > > > Today however SCEV fails to analyze these accesses as affine and as a > > > consequence we end up with gathers: > > > > > > : missed: failed: evolution of base is not affine. > > > base_address: > > > offset from base address: > > > constant offset from base address: > > > step: > > > base alignment: 0 > > > base misalignment: 0 > > > offset alignment: 0 > > > step alignment: 0 > > > base_object: *_63 > > > > > > Looking at SCEV this is because of an outer cast around the CHREC: > > > > > > ) > > > (set_scalar_evolution > > > instantiated_below = 25 > > > (scalar = _65) > > > (scalar_evolution = (long unsigned int) {b_22(D), +, 1}_2)) > > > ) > > > (instantiate_scev > > > (instantiate_below = 25 -> 12) > > > (evolution_loop = 2) > > > (chrec = (long unsigned int) {b_22(D), +, 1}_2) > > > > > > (instantiate_scev > > > (instantiate_below = 25 -> 12) > > > (evolution_loop = 2) > > > (chrec = g_27(D)) > > > (res = g_27(D))) > > > > > > which corresponds to > > > > > > j_66 = b_22(D) + i_67; > > > _65 = (long unsigned int) j_66; > > > _64 = _65 * 4; > > > _63 = g_27(D) + _64; > > > l_62 = *_63; > > > > > > and the _64 is deemed to not be affine: > > > > > > (instantiate_scev > > > (instantiate_below = 25 -> 12) > > > (evolution_loop = 2) > > > (chrec = _64) > > > (analyze_scalar_evolution > > > (loop_nb = 2) > > > (scalar = _64) > > > (get_scalar_evolution > > > (scalar = _64) > > > (scalar_evolution = _64)) > > > ) > > > (res = scev_not_known)) > > > > > > This patch fixes it by (very carefully) folding a multiply on an unsigned > > > affine > > > CHREC into the CHREC itself. > > > > > > which results in > > > > > > (instantiate_scev > > > (instantiate_below = 25 -> 12) > > > (evolution_loop = 2) > > > (chrec = 4) > > > (res = 4)) > > > (set_scalar_evolution > > > instantiated_below = 25 > > > (scalar = _64) > > > (scalar_evolution = {(long unsigned int) b_22(D) * 4, +, 4}_2)) > > > ) > > > (instantiate_scev > > > (instantiate_below = 25 -> 12) > > > (evolution_loop = 2) > > > (chrec = g_27(D)) > > > (res = g_27(D))) > > > (instantiate_scev > > > (instantiate_below = 25 -> 12) > > > (evolution_loop = 2) > > > (chrec = {(long unsigned int) b_22(D) * 4, +, 4}_2) > > > (res = {(long unsigned int) b_22(D) * 4, +, 4}_2)) > > > (set_scalar_evolution > > > instantiated_below = 25 > > > (scalar = _63) > > > (scalar_evolution = {g_27(D) + (long unsigned int) b_22(D) * 4, +, > > > 4}_2)) > > > ) > > > > > > and dataref now correctly analyzes the base > > > > > > base_address: g_27(D) + (sizetype) b_22(D) * 4 > > > offset from base address: 0 > > > constant offset from base address: 0 > > > step: 4 > > > base alignment: 4 > > > base misalignment: 0 > > > offset alignment: 128 > > > step alignment: 4 > > > base_object: *g_27(D) + (sizetype) b_22(D) * 4 > > > Access function 0: {0B, +, 4}_2 > > > > > > producing the final codegen: > > > > > > .L7: > > > ld1b z4.s, p7/z, [x2, x6] > > > cmpne p6.b, p7/z, z4.b, #0 > > > ld1w z29.s, p7/z, [x4, x6, lsl 2] > > > ld1w z2.s, p7/z, [x0, x6, lsl 2] > > > ld1w z3.s, p6/z, [x5] > > > add x6, x6, x7 > > > sel z3.s, p6, z3.s, z29.s > > > add x5, x5, x1 > > > fmla z30.s, p7/m, z2.s, z3.s > > > whilelo p7.s, w6, w3 > > > b.any .L7 > > > faddv s31, p5, z30.s > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, > > > arm-none-linux-gnueabihf, x86_64-pc-linux-gnu > > > -m32, -m64 and no issues. > > > Ok for master? > > > > > > Thanks, > > > Tamar > > > > > > gcc/ChangeLog: > > > > > > * tree-chrec.cc (chrec_fold_multiply): Fold unsigned CHREC mult. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * gcc.dg/vect/vect-scev-affine_1.c: New test. > > > > > > --- > > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-scev-affine_1.c > > b/gcc/testsuite/gcc.dg/vect/vect-scev-affine_1.c > > > new file mode 100644 > > > index > > 0000000000000000000000000000000000000000..929012184e0a2595af > > 826d3d06284d0a6a510119 > > > --- /dev/null > > > +++ b/gcc/testsuite/gcc.dg/vect/vect-scev-affine_1.c > > > @@ -0,0 +1,17 @@ > > > +/* { dg-do compile } */ > > > +/* { dg-require-effective-target vect_float } */ > > > + > > > +float *e; > > > +void f (float *f, float *g, char *h, int n, > > > + int b, int c, int d) > > > +{ > > > + float a = 0; > > > + for (int i = 0; i < n; ++i) { > > > + int j = b + i, k = c + i * d; > > > + float l = g[j], m = h[i] ? g[k] : l; > > > + a += f[i] * m; > > > + } > > > + *e = a; > > > +} > > > + > > > +/* { dg-final { scan-tree-dump-not {failed: evolution of base is not > > > affine} > > "vect" { target aarch64*-*-* } } } */ > > > diff --git a/gcc/tree-chrec.cc b/gcc/tree-chrec.cc > > > index > > 09dd81900bce70138f975c68b77c4ba6d0e45fc3..ff77c7a6c2397f65f3ee17 > > a386408b5ceec4676d 100644 > > > --- a/gcc/tree-chrec.cc > > > +++ b/gcc/tree-chrec.cc > > > @@ -508,9 +508,52 @@ chrec_fold_multiply (tree type, > > > CASE_CONVERT: > > > if (tree_contains_chrecs (op0, NULL)) > > > { > > > + tree inner = TREE_OPERAND (op0, 0); > > > + tree inner_type = TREE_TYPE (inner); > > > + > > > + /* Keep widening unsigned multiplies of affine CHRECs affine. > > > + This handles byte-offset computations such as > > > + (unsigned T) {base, +, step} * C and fold these into > > > + {(unsigned T) base * C, +, (unsigned T) step * C}. */ > > > + if (evolution_function_is_affine_p (inner) > > > + /* The CHREC we're trying to distribute the cast into must be > > > + affine already. */ > > > + && tree_does_not_contain_chrecs (op1) > > > + && INTEGRAL_TYPE_P (type) > > > + && INTEGRAL_TYPE_P (inner_type) > > > + /* Must be unsigned so we don't introduce any UB. */ > > > + && TYPE_UNSIGNED (type) > > > + /* The outer type must at least as wide than the inner type so we > > > + don't truncate when we fold and must the inner CHREC must > > be > > > + non-wrapping so we don't change the behavior when folding > > to > > > + a wider type. */ > > > + && TYPE_PRECISION (type) >= TYPE_PRECISION (inner_type) > > > + && (!TYPE_UNSIGNED (inner_type) > > > + || TYPE_PRECISION (type) == TYPE_PRECISION (inner_type) > > > + || nonwrapping_chrec_p (inner)) > > > + /* The component we are multiplying must be loop invariant > > > + otherwise the base expression can't be simplified and the > > > + resulting CHREC won't be affine. */ > > > + && evolution_function_is_invariant_p (op1, > > > + CHREC_VARIABLE (inner))) > > > + { > > > + tree top1 = chrec_convert (type, op1, NULL); > > > + tree left > > > + = chrec_fold_multiply (type, > > > + chrec_convert (type, CHREC_LEFT (inner), > > > + NULL), top1); > > > + tree right > > > + = chrec_fold_multiply (type, > > > + chrec_convert_rhs (type, > > > + CHREC_RIGHT > > (inner), > > > + NULL), top1); > > > > So what you are basically doing is selectively (only if present > > as multiplication operand), simplify (unsigned T){x, +, s} to > > {(unsigned T)x, +, (unsinged T)s}. > > > > chrec_convert_1 has some similar "tricks" below keep_cast:, > > specifically this is in the class of us not generally widening > > operations because of costs, but for SCEV analysis it's better > > than giving up. > > > > So I think this is better done in chrec_convert_1. > > Thanks, will do. I was worried doing it there since I thought that > unless we have a benefit to it, the folded CHREC could be a > worse representation with the cast in the step. > > I do see why you want it there though as the existing multiply code > can handle it. But any concerns with codegen between > > (unsigned T){x, +, s} and {(unsigned T)x, +, (unsinged T)s} ?
Well, the former isn't affine, so it does not end up anywhere and is unhelpful. The latter eventually lets us progress. > > Thanks, > Tamar > > > > > Richard. > > > > > + return build_polynomial_chrec (CHREC_VARIABLE (inner), > > > + left, right); > > > + } > > > + > > > /* We can strip sign-conversions to signed by performing the > > > operation in unsigned. */ > > > - tree optype = TREE_TYPE (TREE_OPERAND (op0, 0)); > > > + tree optype = inner_type; > > > if (INTEGRAL_TYPE_P (type) > > > && INTEGRAL_TYPE_P (optype) > > > && tree_nop_conversion_p (type, optype) > > > > > > > > > > > > > -- > > Richard Biener <[email protected]> > > SUSE Software Solutions Germany GmbH, > > Frankenstrasse 146, 90461 Nuernberg, Germany; > > GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG > > Nuernberg) > -- Richard Biener <[email protected]> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Jochen Jaser, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
