On Thu, Oct 26, 2023 at 2:29 AM Richard Biener <richard.guent...@gmail.com> wrote: > > On Wed, Oct 25, 2023 at 5:51 AM Andrew Pinski <pins...@gmail.com> wrote: > > > > I noticed we were missing optimizing `a / (1 << b)` when > > we know that a is nonnegative but only due to ranger information. > > This adds the use of the global ranger to tree_single_nonnegative_warnv_p > > for SSA_NAME. > > I didn't extend tree_single_nonnegative_warnv_p to use the ranger for > > floating > > point nor to use the local ranger since I am not 100% sure it is safe where > > all of the uses tree_expr_nonnegative_p would be safe. > > > > Note pr80776-1.c testcase fails again due to vrp's bad handling of setting > > global ranges from __builtin_unreachable. It just happened to be optimized > > before due to global ranges not being used as much. > > > > Bootstrapped and tested on x86_64-linux-gnu with no regressions. > > > > PR tree-optimization/111959 > > > > gcc/ChangeLog: > > > > * fold-const.cc (tree_single_nonnegative_warnv_p): Use > > the global range to see if the SSA_NAME was nonnegative. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/tree-ssa/forwprop-42.c: New test. > > * gcc.dg/pr80776-1.c: xfail and update comment. > > --- > > gcc/fold-const.cc | 36 +++++++++++++++------ > > gcc/testsuite/gcc.dg/pr80776-1.c | 8 ++--- > > gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c | 15 +++++++++ > > 3 files changed, 46 insertions(+), 13 deletions(-) > > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c > > > > diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc > > index 40767736389..2a2a90230f5 100644 > > --- a/gcc/fold-const.cc > > +++ b/gcc/fold-const.cc > > @@ -15047,15 +15047,33 @@ tree_single_nonnegative_warnv_p (tree t, bool > > *strict_overflow_p, int depth) > > return RECURSE (TREE_OPERAND (t, 1)) && RECURSE (TREE_OPERAND (t, > > 2)); > > > > case SSA_NAME: > > - /* Limit the depth of recursion to avoid quadratic behavior. > > - This is expected to catch almost all occurrences in practice. > > - If this code misses important cases that unbounded recursion > > - would not, passes that need this information could be revised > > - to provide it through dataflow propagation. */ > > - return (!name_registered_for_update_p (t) > > - && depth < param_max_ssa_name_query_depth > > - && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t), > > - strict_overflow_p, > > depth)); > > + { > > + /* For integral types, querry the global range if possible. */ > > query > > > + if (INTEGRAL_TYPE_P (TREE_TYPE (t))) > > + { > > + value_range vr; > > + if (get_global_range_query ()->range_of_expr (vr, t) > > + && !vr.varying_p () && !vr.undefined_p ()) > > + { > > + /* If the range is nonnegative, return true. */ > > + if (vr.nonnegative_p ()) > > + return true; > > + > > + /* If the range is non-positive, then return false. */ > > + if (vr.nonpositive_p ()) > > + return false; > > That's testing for <= 0, nonnegative for >= 0. This means when > vr.nonpositive_p () the value could still be zero (and nonnegative), > possibly be figured out by the recursion below. > > Since we don't have negative_p () do we want to test > nonpositive_p () && nonzero_p () instead?
I was thinking about that when I was writing the patch. If the ranger figured out the value was zero, nonnegative_p would have returned true. So while yes nonpositive_p() would return true but we already checked nonnegative_p beforehand and the nonzero_p would not matter. Now the question is if after nonnegative_p we check if the range could contain 0 still is that worth the recursion. The whole idea of returning false was to remove the need from recursion as much. Thanks, Andrew > > OK with that change. > > Richard. > > > + } > > + } > > + /* Limit the depth of recursion to avoid quadratic behavior. > > + This is expected to catch almost all occurrences in practice. > > + If this code misses important cases that unbounded recursion > > + would not, passes that need this information could be revised > > + to provide it through dataflow propagation. */ > > + return (!name_registered_for_update_p (t) > > + && depth < param_max_ssa_name_query_depth > > + && gimple_stmt_nonnegative_warnv_p (SSA_NAME_DEF_STMT (t), > > + strict_overflow_p, > > depth)); > > + } > > > > default: > > return tree_simple_nonnegative_warnv_p (TREE_CODE (t), TREE_TYPE > > (t)); > > diff --git a/gcc/testsuite/gcc.dg/pr80776-1.c > > b/gcc/testsuite/gcc.dg/pr80776-1.c > > index b9bce62d982..f3d47aeda36 100644 > > --- a/gcc/testsuite/gcc.dg/pr80776-1.c > > +++ b/gcc/testsuite/gcc.dg/pr80776-1.c > > @@ -18,14 +18,14 @@ Foo (void) > > if (! (0 <= i && i <= 999999)) > > __builtin_unreachable (); > > > > - /* Legacy evrp sets the range of i to [0, MAX] *before* the first > > conditional, > > + /* vrp1 sets the range of i to [0, MAX] *before* the first conditional, > > and to [0,999999] *before* the second conditional. This is because > > both > > - evrp and VRP use trickery to set global ranges when this particular > > use of > > + vrp use trickery to set global ranges when this particular use of > > a __builtin_unreachable is in play (see uses of > > assert_unreachable_fallthru_edge_p). > > > > - Setting these ranges at the definition site, causes VRP to remove the > > + Setting these ranges at the definition site, causes other passes to > > remove the > > unreachable code altogether, leaving the following sprintf unguarded. > > This > > causes the bogus warning below. */ > > - sprintf (number, "%d", i); /* { dg-bogus "writing" "" } */ > > + sprintf (number, "%d", i); /* { dg-bogus "writing" "" { xfail *-*-* } } > > */ > > } > > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c > > b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c > > new file mode 100644 > > index 00000000000..4e5421ed4d4 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/tree-ssa/forwprop-42.c > > @@ -0,0 +1,15 @@ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > > +/* PR tree-optimization/111959 */ > > + > > +int divbypow2(int a, int b) > > +{ > > + if (a & ~0xff) __builtin_unreachable(); > > + return a / (1<<b); > > +} > > + > > +/* divbypow2 should be able to optimize to just a/b as a is known to be > > always positive. */ > > +/* { dg-final { scan-tree-dump-not " / " "optimized" } } */ > > +/* { dg-final { scan-tree-dump-not " << " "optimized" } } */ > > +/* { dg-final { scan-tree-dump-times " >> " 1 "optimized" } } */ > > + > > -- > > 2.39.3 > >