Re: [PATCH] Fix PR81082

Richard Biener Fri, 26 Jan 2018 02:26:33 -0800

On Thu, 25 Jan 2018, Richard Biener wrote:

> On Thu, 25 Jan 2018, Marc Glisse wrote:
> 
> > On Thu, 25 Jan 2018, Richard Biener wrote:
> > 
> > > --- gcc/match.pd  (revision 257047)
> > > +++ gcc/match.pd  (working copy)
> > > @@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >      (minus (convert (view_convert:stype @1))
> > >       (convert (view_convert:stype @2)))))))
> > > 
> > > +/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).
> > > +    Modeled after fold_plusminus_mult_expr.  */
> > > +(if (!TYPE_SATURATING (type)
> > > +     && (!FLOAT_TYPE_P (type) || flag_associative_math))
> > > + (for plusminus (plus minus)
> > > +  (simplify
> > > +   (plusminus (mult:s @0 @1) (mult:cs @0 @2))
> > 
> > No :c on the first mult, so we don't actually handle A*C+B*C?
> 
> Hmm, I somehow convinced myself that it's only necessary on one
> of the mults...  but you are of course right.  Will re-test with
> that fixed.


This is what I have applied.  Note I had to XFAIL a minor part of
gcc.dg/tree-ssa/loop-15.c, namely simplifying the final value
replacement down to n * n.  While the patch should enable this
the place where the transform could trigger with enough information
about n is VRP but that doesn't end up folding all stmts - and
that's something I'm not willing to change right now.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied on trunk.

Richard.

2018-01-26  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/81082
        * fold-const.c (fold_plusminus_mult_expr): Do not perform the
        association if it requires casting to unsigned.
        * match.pd ((A * C) +- (B * C) -> (A+-B)): New patterns derived
        from fold_plusminus_mult_expr to catch important cases late when
        range info is available.

        * gcc.dg/vect/pr81082.c: New testcase.
        * gcc.dg/tree-ssa/loop-15.c: XFAIL the (int)((unsigned)n + -1U) * n + n
        simplification to n * n.

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c    (revision 257047)
+++ gcc/fold-const.c    (working copy)
@@ -7097,7 +7097,7 @@ fold_plusminus_mult_expr (location_t loc
 
   /* Same may be zero and thus the operation 'code' may overflow.  Likewise
      same may be minus one and thus the multiplication may overflow.  Perform
-     the operations in an unsigned type.  */
+     the sum operation in an unsigned type.  */
   tree utype = unsigned_type_for (type);
   tree tem = fold_build2_loc (loc, code, utype,
                              fold_convert_loc (loc, utype, alt0),
@@ -7110,9 +7110,9 @@ fold_plusminus_mult_expr (location_t loc
     return fold_build2_loc (loc, MULT_EXPR, type,
                            fold_convert (type, tem), same);
 
-  return fold_convert_loc (loc, type,
-                          fold_build2_loc (loc, MULT_EXPR, utype, tem,
-                                           fold_convert_loc (loc, utype, 
same)));
+  /* Do not resort to unsigned multiplication because
+     we lose the no-overflow property of the expression.  */
+  return NULL_TREE;
 }
 
 /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
Index: gcc/match.pd
===================================================================
--- gcc/match.pd        (revision 257047)
+++ gcc/match.pd        (working copy)
@@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
      (minus (convert (view_convert:stype @1))
            (convert (view_convert:stype @2)))))))
 
+/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).
+    Modeled after fold_plusminus_mult_expr.  */
+(if (!TYPE_SATURATING (type)
+     && (!FLOAT_TYPE_P (type) || flag_associative_math))
+ (for plusminus (plus minus)
+  (simplify
+   (plusminus (mult:cs @0 @1) (mult:cs @0 @2))
+   (if (!ANY_INTEGRAL_TYPE_P (type)
+        || TYPE_OVERFLOW_WRAPS (type)
+        || (INTEGRAL_TYPE_P (type)
+           && tree_expr_nonzero_p (@0)
+           && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)))))
+    (mult (plusminus @1 @2) @0)))
+  /* We cannot generate constant 1 for fract.  */
+  (if (!ALL_FRACT_MODE_P (TYPE_MODE (type)))
+   (simplify
+    (plusminus @0 (mult:cs @0 @2))
+    (if (!ANY_INTEGRAL_TYPE_P (type)
+        || TYPE_OVERFLOW_WRAPS (type)
+        || (INTEGRAL_TYPE_P (type)
+            && tree_expr_nonzero_p (@0)
+            && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)))))
+     (mult (plusminus { build_one_cst (type); } @2) @0)))
+   (simplify
+    (plusminus (mult:cs @0 @2) @0)
+    (if (!ANY_INTEGRAL_TYPE_P (type)
+        || TYPE_OVERFLOW_WRAPS (type)
+        || (INTEGRAL_TYPE_P (type)
+            && tree_expr_nonzero_p (@0)
+            && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)))))
+     (mult (plusminus @2 { build_one_cst (type); }) @0))))))
 
 /* Simplifications of MIN_EXPR, MAX_EXPR, fmin() and fmax().  */
 
Index: gcc/testsuite/gcc.dg/vect/pr81082.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/pr81082.c (revision 0)
+++ gcc/testsuite/gcc.dg/vect/pr81082.c (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+int
+f (int *x, int b1, int b2, int b3)
+{
+  int foo = 0;
+  for (int i1 = 0; i1 < b1; ++i1)
+    for (int i2 = 0; i2 < b2; ++i2)
+      for (int i3 = 0; i3 < b3; ++i3)
+       foo += x[i1 * b2 * b3 + i2 * b3 + (i3 - 1)];
+  return foo;
+}
+
+/* { dg-final { scan-tree-dump "vectorized 1 loops in function" "vect" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/loop-15.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/loop-15.c     (revision 257048)
+++ gcc/testsuite/gcc.dg/tree-ssa/loop-15.c     (working copy)
@@ -19,7 +19,7 @@ int bla(void)
 }
 
 /* Since the loop is removed, there should be no addition.  */
-/* { dg-final { scan-tree-dump-times " \\+ " 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times " \\+ " 0 "optimized" { xfail *-*-* } } } 
*/
 /* { dg-final { scan-tree-dump-times " \\* " 1 "optimized" } } */
 
 /* The if from the loop header copying remains in the code.  */

Re: [PATCH] Fix PR81082

Reply via email to