> Hmm, code does not match comment? I think you want r.nonzero_p ().
> But that also means that at (*) we cannot simply leave 'r' to be the
> range of the unshifted src.
>
>> + assumptions = boolean_true_node;
>> + }
>> +
>> + /* If that didn't work use simplify_using_initial_conditions. */
>> + if (assumptions == boolean_false_node)
Yeah, I originally used the path-based ranger and had an inverse condition.
Then I removed it and swapped the condition, realizing that things still work.
But they don't actually... as only a path-based approach has enough information
but the boundary conditions here are constricted enough that my test case still
passes (similar to v1) because nonzero_p is also the wrong check...
I re-introduced the path-based approach now, using a helper function, and
re-tested everything on x86, rv64gcv_zvl512b, and power10 in the attached v3.
Regards
Robin
[PATCH v3] niter: Use range to query ctz range.
PR/tree-optimization 122207
gcc/ChangeLog:
* tree-ssa-loop-niter.cc (shifted_range_nonzero_p): New
function.
(number_of_iterations_cltz): Call new function.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ctz-char.c: Remove -fno-tree-ch.
* gcc.dg/tree-ssa/ctz-complement-char.c: Ditto.
* gcc.dg/tree-ssa/ctz-complement-int.c: Ditto.
* gcc.dg/tree-ssa/ctz-complement-long-long.c: Ditto.
* gcc.dg/tree-ssa/ctz-complement-long.c: Ditto.
* gcc.dg/tree-ssa/ctz-int.c: Ditto.
* gcc.dg/tree-ssa/ctz-long-long.c: Ditto.
* gcc.dg/tree-ssa/ctz-long.c: Ditto.
* gcc.dg/tree-ssa/ctz-ch.c: New test.
---
gcc/testsuite/gcc.dg/tree-ssa/ctz-ch.c | 23 ++++
gcc/testsuite/gcc.dg/tree-ssa/ctz-char.c | 2 +-
.../gcc.dg/tree-ssa/ctz-complement-char.c | 2 +-
.../gcc.dg/tree-ssa/ctz-complement-int.c | 2 +-
.../tree-ssa/ctz-complement-long-long.c | 2 +-
.../gcc.dg/tree-ssa/ctz-complement-long.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/ctz-int.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/ctz-long-long.c | 2 +-
gcc/testsuite/gcc.dg/tree-ssa/ctz-long.c | 2 +-
gcc/tree-ssa-loop-niter.cc | 104 +++++++++++++++++-
10 files changed, 132 insertions(+), 11 deletions(-)
create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ctz-ch.c
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-ch.c
b/gcc/testsuite/gcc.dg/tree-ssa/ctz-ch.c
new file mode 100644
index 00000000000..5d725979971
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-ch.c
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+typedef unsigned long BITMAP_WORD;
+
+bool
+bmp_iter_set (BITMAP_WORD bits, unsigned *bit_no)
+{
+ /* If our current word is nonzero, it contains the bit we want. */
+ if (bits)
+ {
+ while (!(bits & 1))
+ {
+ bits >>= 1;
+ *bit_no += 1;
+ }
+ return true;
+ }
+
+ return false;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_ctz|\\.CTZ" 1 "optimized" } }
*/
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-char.c
b/gcc/testsuite/gcc.dg/tree-ssa/ctz-char.c
index 3cd166acbd4..fa8b7f39de4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-char.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-char.c
@@ -1,6 +1,6 @@
/* { dg-do run } */
/* { dg-require-effective-target ctz } */
-/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
#define PREC (__CHAR_BIT__)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c
b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c
index b9afe8852d8..5ebc3213169 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c
@@ -1,6 +1,6 @@
/* { dg-do run } */
/* { dg-require-effective-target ctz } */
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
#define PREC (__CHAR_BIT__)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c
b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c
index d2702a65daf..0ce4b6beaa7 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c
@@ -1,6 +1,6 @@
/* { dg-do run } */
/* { dg-require-effective-target ctz } */
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
#define PREC (__CHAR_BIT__ * __SIZEOF_INT__)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c
b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c
index 1ea0d5d7d9f..f98bec039b3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c
@@ -1,6 +1,6 @@
/* { dg-do run } */
/* { dg-require-effective-target ctzll } */
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
#define PREC (__CHAR_BIT__ * __SIZEOF_LONG_LONG__)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c
b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c
index 80fb02dcfa6..8edb3728131 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c
@@ -1,6 +1,6 @@
/* { dg-do run } */
/* { dg-require-effective-target ctzl } */
-/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
#define PREC (__CHAR_BIT__ * __SIZEOF_LONG__)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-int.c
b/gcc/testsuite/gcc.dg/tree-ssa/ctz-int.c
index 7f63493eb73..2bf3ae69b93 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-int.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-int.c
@@ -1,6 +1,6 @@
/* { dg-do run } */
/* { dg-require-effective-target ctz } */
-/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
#define PREC (__CHAR_BIT__ * __SIZEOF_INT__)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-long-long.c
b/gcc/testsuite/gcc.dg/tree-ssa/ctz-long-long.c
index 924f61b76f0..2e159485cb9 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-long-long.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-long-long.c
@@ -1,6 +1,6 @@
/* { dg-do run } */
/* { dg-require-effective-target ctzll } */
-/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
#define PREC (__CHAR_BIT__ * __SIZEOF_LONG_LONG__)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-long.c
b/gcc/testsuite/gcc.dg/tree-ssa/ctz-long.c
index 178945daa8a..2e3be652a0b 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-long.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-long.c
@@ -1,6 +1,6 @@
/* { dg-do run } */
/* { dg-require-effective-target ctzl } */
-/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
#define PREC (__CHAR_BIT__ * __SIZEOF_LONG__)
diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
index 6e130862549..9d62260540d 100644
--- a/gcc/tree-ssa-loop-niter.cc
+++ b/gcc/tree-ssa-loop-niter.cc
@@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see
#include "tree-dfa.h"
#include "internal-fn.h"
#include "gimple-range.h"
+#include "gimple-range-path.h"
#include "sreal.h"
@@ -2321,6 +2322,54 @@ is_rshift_by_1 (gassign *stmt)
return false;
}
+/* Helper for number_of_iterations_cltz that uses path-based ranger to
+ determine if SRC, shifted left (when LEFT_SHIFT is true) or right
+ by NUM_IGNORED_BITS, is guaranteed to be != 0 inside LOOP.
+ Return true if so or false otherwise. */
+
+static bool
+shifted_range_nonzero_p (loop_p loop, tree src,
+ bool left_shift, int num_ignored_bits)
+{
+ int_range_max r (TREE_TYPE (src));
+ gcc_assert (num_ignored_bits >= 0);
+
+ auto_vec<basic_block, 2> path;
+ path.safe_push (loop->header);
+ path.safe_push (loop_preheader_edge (loop)->src);
+
+ bool range_nonzero = false;
+ gimple_ranger ranger;
+ path_range_query query (ranger, path);
+
+ if (query.range_of_expr (r, src)
+ && !r.varying_p ()
+ && !r.undefined_p ())
+ {
+ if (num_ignored_bits)
+ {
+ range_op_handler op (left_shift ? LSHIFT_EXPR : RSHIFT_EXPR);
+ int_range_max shifted_range (TREE_TYPE (src));
+ wide_int shift_count = wi::shwi (num_ignored_bits,
+ TYPE_PRECISION (TREE_TYPE
+ (src)));
+ int_range_max shift_amount
+ (TREE_TYPE (src), shift_count, shift_count);
+
+ if (op.fold_range (shifted_range, TREE_TYPE (src), r,
+ shift_amount))
+ r = shifted_range;
+ }
+
+ /* If the range does not contain zero we are good. */
+ if (!range_includes_zero_p (r))
+ range_nonzero = true;
+ }
+
+ return range_nonzero;
+}
+
+
/* See comment below for number_of_iterations_bitcount.
For c[lt]z, we have:
@@ -2438,6 +2487,9 @@ number_of_iterations_cltz (loop_p loop, edge exit,
tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);
int src_precision = TYPE_PRECISION (TREE_TYPE (src));
+ /* Save the original SSA name before preprocessing for ranger queries. */
+ tree unshifted_src = src;
+
/* Apply any needed preprocessing to src. */
int num_ignored_bits;
if (left_shift)
@@ -2463,10 +2515,56 @@ number_of_iterations_cltz (loop_p loop, edge exit,
expr = fold_convert (unsigned_type_node, expr);
- tree assumptions = fold_build2 (NE_EXPR, boolean_type_node, src,
- build_zero_cst (TREE_TYPE (src)));
+ /* If the copy-header (ch) pass peeled one iteration we're shifting
+ SRC by preprocessing it above.
+
+ A loop like
+ if (bits)
+ {
+ while (!(bits & 1))
+ {
+ bits >>= 1;
+ cnt += 1;
+ }
+ return cnt;
+ }
+ ch (roughly) transforms into:
+ if (bits)
+ {
+ if (!(bits & 1)
+ {
+ do
+ {
+ bits >>= 1;
+ cnt += 1;
+ } while (!(bits & 1));
+ }
+ else
+ cnt = 1;
+ return cnt;
+ }
+
+ Then, our preprocessed SRC (that is used for c[tl]z computation)
+ will be bits >> 1, and the assumption is bits >> 1 != 0. */
+
+ /* As we don't have a gimple statement to query, use a path-based range
+ query from the preheader to the loop header to get the range of the
+ unshifted SRC, then apply the shift manually and check if the
+ resulting range is != 0. */
+ tree assumptions;
+ if (shifted_range_nonzero_p (loop, unshifted_src,
+ left_shift, num_ignored_bits))
+ assumptions = boolean_true_node;
+ else
+ {
+ /* If ranger couldn't prove the assumption, try
+ simplify_using_initial_conditions. */
+ assumptions = fold_build2 (NE_EXPR, boolean_type_node, src,
+ build_zero_cst (TREE_TYPE (src)));
+ assumptions = simplify_using_initial_conditions (loop, assumptions);
+ }
- niter->assumptions = simplify_using_initial_conditions (loop, assumptions);
+ niter->assumptions = assumptions;
niter->may_be_zero = boolean_false_node;
niter->niter = simplify_using_initial_conditions (loop, expr);
--
2.51.0