On Fri, 17 May 2024, Manolis Tsamis wrote:

> On Fri, May 17, 2024 at 12:22 PM Richard Biener <rguent...@suse.de> wrote:
> >
> > On Fri, 17 May 2024, Manolis Tsamis wrote:
> >
> > > Hi Richard,
> > >
> > > While I was re-testing the latest version of this patch I noticed that
> > > it FAILs an AArch64 test, gcc.target/aarch64/subsp.c. With the patch
> > > we generate one instruction more:
> > >
> > >         sbfiz   x1, x1, 4, 32
> > >         stp     x29, x30, [sp, -16]!
> > >         add     x1, x1, 16
> > >         mov     x29, sp
> > >         sub     sp, sp, x1
> > >         mov     x0, sp
> > >         bl      foo
> > >
> > > Instead of:
> > >
> > >         stp     x29, x30, [sp, -16]!
> > >         add     w1, w1, 1
> > >         mov     x29, sp
> > >         sub     sp, sp, w1, sxtw 4
> > >         mov     x0, sp
> > >         bl      foo
> > >
> > > I've looked at it but can't really find a way to solve the regression.
> > > Any thoughts on this?
> >
> > Can you explain what goes wrong?  As I said rewriting parts of
> > address calculation is tricky, there's always the chance that some
> > cases regress (see your observation in comment#4 of the PR).
> >
> 
> In this case the int -> sizetype cast ends up happening earlier. Instead of
> 
>   _7 = y_6(D) + 1;
>   _1 = (sizetype) _7;
>   _2 = _1 * 16;
> 
> We get
> 
>   _13 = (sizetype) y_6(D);
>   _15 = _13 + 1;
>   _2 = _15 * 16;
> 
> and then in RTL we have
> 
> x1 = ((sizetype) x1) << 4
> sp = sp - (x1 + 16)
> 
> instead of
> 
> x1 = x1 + 1
> sp = sp - ((sizetype) x1) << 4
> 
> which doesn't form sub sp, sp, w1, sxtw 4.
> 
> But more importantly, I realized that (in this case among others) the
> pattern is undone by (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A
> -> A * (C+-1). AFAIK having one pattern and its reverse is a bad thing
> so something needs to be changed.

Yes, we have that issue.  And we've guarded GIMPLE vs. non-GIMPLE and
have recursion limits in match to deal with this.  But yes, having
both is bad.  I'd say that clearly patterns reducing the number of
operations are good at least for canonicalization.

> One idea could be to only keep the larger one ((T)(A + CST1)) * CST2 +
> CST3 -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3). it's not enough to
> deal with the testcases of the ticket but it does help in other cases.

The issue with such larger patterns is that they hint at the fact
the transform should happen with an eye on more than just the
small expresion.  Thus not in match.pd but in a pass like reassoc
or SLSR or IVOPTs or even CSE itself.  We also have to avoid
doing changes that cannot be undone when canonicalizing.

Richard.

> Manolis
> 
> > Note that I still believe that avoiding the early and premature
> > promotion of the addition to unsigned is a good thing.
> >
> > Note the testcase in the PR is fixed with -fwrapv because then
> > we do _not_ perform this premature optimization.  Without -fwrapv
> > the optimization is valid but as you note we do not perform it
> > consistently - otherwise we wouldn't regress.
> >
> > Richard.
> >
> >
> >
> > > Thanks,
> > > Manolis
> > >
> > >
> > >
> > > On Thu, May 16, 2024 at 11:15 AM Richard Biener
> > > <richard.guent...@gmail.com> wrote:
> > > >
> > > > On Tue, May 14, 2024 at 10:58 AM Manolis Tsamis 
> > > > <manolis.tsa...@vrull.eu> wrote:
> > > > >
> > > > > New patch with the requested changes can be found below.
> > > > >
> > > > > I don't know how much this affects SCEV, but I do believe that we
> > > > > should incorporate this change somehow. I've seen various cases of
> > > > > suboptimal address calculation codegen that boil down to this.
> > > >
> > > > This misses the ChangeLog (I assume it's unchanged) and indent
> > > > of the match.pd part is now off.
> > > >
> > > > Please fix that, the patch is OK with that change.
> > > >
> > > > Thanks,
> > > > Richard.
> > > >
> > > > > gcc/match.pd | 31 +++++++++++++++++++++++++++++++
> > > > > gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > > > 2 files changed, 47 insertions(+)
> > > > > create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > > >
> > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > index 07e743ae464..1d642c205f0 100644
> > > > > --- a/gcc/match.pd
> > > > > +++ b/gcc/match.pd
> > > > > @@ -3650,6 +3650,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > > (plus (convert @0) (op @2 (convert @1))))))
> > > > > #endif
> > > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > > + -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > > + Where (A + CST1) doesn't need to have a single use. */
> > > > > +#if GIMPLE
> > > > > + (for op (plus minus)
> > > > > + (simplify
> > > > > + (plus (mult:s (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > + INTEGER_CST@3)
> > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > > + && INTEGRAL_TYPE_P (type)
> > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > > + (op (mult (convert @0) @2) (plus (mult (convert @1) @2) @3)))))
> > > > > +#endif
> > > > > +
> > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2) */
> > > > > +#if GIMPLE
> > > > > + (for op (plus minus)
> > > > > + (simplify
> > > > > + (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > + (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> > > > > + && INTEGRAL_TYPE_P (type)
> > > > > + && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > + && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > + && TYPE_OVERFLOW_WRAPS (type))
> > > > > + (op (mult (convert @0) @2) (mult (convert @1) @2)))))
> > > > > +#endif
> > > > > +
> > > > > /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be 
> > > > > simplified
> > > > > to a simple value. */
> > > > > (for op (plus minus)
> > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c 
> > > > > b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > new file mode 100644
> > > > > index 00000000000..e9051273672
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > @@ -0,0 +1,16 @@
> > > > > +/* PR tree-optimization/109393 */
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> > > > > +
> > > > > +int foo(int *a, int j)
> > > > > +{
> > > > > + int k = j - 1;
> > > > > + return a[j - 1] == a[k];
> > > > > +}
> > > > > +
> > > > > +int bar(int *a, int j)
> > > > > +{
> > > > > + int k = j - 1;
> > > > > + return (&a[j + 1] - 2) == &a[k];
> > > > > +}
> > > > > --
> > > > > 2.44.0
> > > > >
> > > > >
> > > > > On Tue, Apr 23, 2024 at 1:33 PM Manolis Tsamis 
> > > > > <manolis.tsa...@vrull.eu> wrote:
> > > > > >
> > > > > > The original motivation for this pattern was that the following 
> > > > > > function does
> > > > > > not fold to 'return 1':
> > > > > >
> > > > > > int foo(int *a, int j)
> > > > > > {
> > > > > >   int k = j - 1;
> > > > > >   return a[j - 1] == a[k];
> > > > > > }
> > > > > >
> > > > > > The expression ((unsigned long) (X +- C1) * C2) appears frequently 
> > > > > > as part of
> > > > > > address calculations (e.g. arrays). These patterns help fold and 
> > > > > > simplify more
> > > > > > expressions.
> > > > > >
> > > > > >         PR tree-optimization/109393
> > > > > >
> > > > > > gcc/ChangeLog:
> > > > > >
> > > > > >         * match.pd: Add new patterns for ((T)(A +- CST1)) * CST2 and
> > > > > >           ((T)(A +- CST1)) * CST2 + CST3.
> > > > > >
> > > > > > gcc/testsuite/ChangeLog:
> > > > > >
> > > > > >         * gcc.dg/pr109393.c: New test.
> > > > > >
> > > > > > Signed-off-by: Manolis Tsamis <manolis.tsa...@vrull.eu>
> > > > > > ---
> > > > > >
> > > > > >  gcc/match.pd                    | 30 ++++++++++++++++++++++++++++++
> > > > > >  gcc/testsuite/gcc.dg/pr109393.c | 16 ++++++++++++++++
> > > > > >  2 files changed, 46 insertions(+)
> > > > > >  create mode 100644 gcc/testsuite/gcc.dg/pr109393.c
> > > > > >
> > > > > > diff --git a/gcc/match.pd b/gcc/match.pd
> > > > > > index d401e7503e6..13c828ba70d 100644
> > > > > > --- a/gcc/match.pd
> > > > > > +++ b/gcc/match.pd
> > > > > > @@ -3650,6 +3650,36 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > > > > >         (plus (convert @0) (op @2 (convert @1))))))
> > > > > >  #endif
> > > > > >
> > > > > > +/* ((T)(A + CST1)) * CST2 + CST3
> > > > > > +     -> ((T)(A) * CST2) + ((T)CST1 * CST2 + CST3)
> > > > > > +   Where (A + CST1) doesn't need to have a single use.  */
> > > > > > +#if GIMPLE
> > > > > > +  (for op (plus minus)
> > > > > > +   (simplify
> > > > > > +    (plus (mult (convert:s (op @0 INTEGER_CST@1)) INTEGER_CST@2) 
> > > > > > INTEGER_CST@3)
> > > > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > > > +       (op (mult @2 (convert @0)) (plus (mult @2 (convert @1)) 
> > > > > > @3)))))
> > > > > > +#endif
> > > > > > +
> > > > > > +/* ((T)(A + CST1)) * CST2 -> ((T)(A) * CST2) + ((T)CST1 * CST2)  */
> > > > > > +#if GIMPLE
> > > > > > +  (for op (plus minus)
> > > > > > +   (simplify
> > > > > > +    (mult (convert:s (op:s @0 INTEGER_CST@1)) INTEGER_CST@2)
> > > > > > +     (if (TREE_CODE (TREE_TYPE (@0)) == INTEGER_TYPE
> > > > > > +         && TREE_CODE (type) == INTEGER_TYPE
> > > > > > +         && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (@0))
> > > > > > +         && !TYPE_OVERFLOW_SANITIZED (TREE_TYPE (@0))
> > > > > > +         && TYPE_OVERFLOW_WRAPS (type))
> > > > > > +       (op (mult @2 (convert @0)) (mult @2 (convert @1))))))
> > > > > > +#endif
> > > > > > +
> > > > > >  /* (T)(A) +- (T)(B) -> (T)(A +- B) only when (A +- B) could be 
> > > > > > simplified
> > > > > >     to a simple value.  */
> > > > > >    (for op (plus minus)
> > > > > > diff --git a/gcc/testsuite/gcc.dg/pr109393.c 
> > > > > > b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > > new file mode 100644
> > > > > > index 00000000000..e9051273672
> > > > > > --- /dev/null
> > > > > > +++ b/gcc/testsuite/gcc.dg/pr109393.c
> > > > > > @@ -0,0 +1,16 @@
> > > > > > +/* PR tree-optimization/109393 */
> > > > > > +/* { dg-do compile } */
> > > > > > +/* { dg-options "-O2 -fdump-tree-optimized" } */
> > > > > > +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } 
> > > > > > */
> > > > > > +
> > > > > > +int foo(int *a, int j)
> > > > > > +{
> > > > > > +  int k = j - 1;
> > > > > > +  return a[j - 1] == a[k];
> > > > > > +}
> > > > > > +
> > > > > > +int bar(int *a, int j)
> > > > > > +{
> > > > > > +  int k = j - 1;
> > > > > > +  return (&a[j + 1] - 2) == &a[k];
> > > > > > +}
> > > > > > --
> > > > > > 2.34.1
> > > > > >
> > >
> >
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

Reply via email to