On Mon, Oct 27, 2025 at 3:25 PM Robin Dapp <[email protected]> wrote:
>
> > Hmm, code does not match comment? I think you want r.nonzero_p ().
> > But that also means that at (*) we cannot simply leave 'r' to be the
> > range of the unshifted src.
> >
> >> + assumptions = boolean_true_node;
> >> + }
> >> +
> >> + /* If that didn't work use simplify_using_initial_conditions. */
> >> + if (assumptions == boolean_false_node)
>
> Yeah, I originally used the path-based ranger and had an inverse condition.
> Then I removed it and swapped the condition, realizing that things still work.
> But they don't actually... as only a path-based approach has enough
> information
> but the boundary conditions here are constricted enough that my test case
> still
> passes (similar to v1) because nonzero_p is also the wrong check...
>
> I re-introduced the path-based approach now, using a helper function, and
> re-tested everything on x86, rv64gcv_zvl512b, and power10 in the attached v3.
Hmm. I think that non-path ranger should work as well if you provide it with a
context (gimple stmt). The context in this case should be the loop exit stmt
we are analyzing (the gcond) I think. OTOH what you model in the path ranger
is basically the loop entry edge, so you could use range_on_edge of the entry
edge as well?
I hoped Andrew would comment on the patch, maybe he can on the above.
I'll note that niter analysis currently does not have an active ranger on its
own (but only possibly its clients). For CTZ/CLZ replacement that's
SCCP, so if required we could enable ranger in that pass.
Richard.
> Regards
> Robin
>
>
> [PATCH v3] niter: Use range to query ctz range.
>
> PR/tree-optimization 122207
>
> gcc/ChangeLog:
>
> * tree-ssa-loop-niter.cc (shifted_range_nonzero_p): New
> function.
> (number_of_iterations_cltz): Call new function.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/ctz-char.c: Remove -fno-tree-ch.
> * gcc.dg/tree-ssa/ctz-complement-char.c: Ditto.
> * gcc.dg/tree-ssa/ctz-complement-int.c: Ditto.
> * gcc.dg/tree-ssa/ctz-complement-long-long.c: Ditto.
> * gcc.dg/tree-ssa/ctz-complement-long.c: Ditto.
> * gcc.dg/tree-ssa/ctz-int.c: Ditto.
> * gcc.dg/tree-ssa/ctz-long-long.c: Ditto.
> * gcc.dg/tree-ssa/ctz-long.c: Ditto.
> * gcc.dg/tree-ssa/ctz-ch.c: New test.
> ---
> gcc/testsuite/gcc.dg/tree-ssa/ctz-ch.c | 23 ++++
> gcc/testsuite/gcc.dg/tree-ssa/ctz-char.c | 2 +-
> .../gcc.dg/tree-ssa/ctz-complement-char.c | 2 +-
> .../gcc.dg/tree-ssa/ctz-complement-int.c | 2 +-
> .../tree-ssa/ctz-complement-long-long.c | 2 +-
> .../gcc.dg/tree-ssa/ctz-complement-long.c | 2 +-
> gcc/testsuite/gcc.dg/tree-ssa/ctz-int.c | 2 +-
> gcc/testsuite/gcc.dg/tree-ssa/ctz-long-long.c | 2 +-
> gcc/testsuite/gcc.dg/tree-ssa/ctz-long.c | 2 +-
> gcc/tree-ssa-loop-niter.cc | 104 +++++++++++++++++-
> 10 files changed, 132 insertions(+), 11 deletions(-)
> create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ctz-ch.c
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-ch.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ctz-ch.c
> new file mode 100644
> index 00000000000..5d725979971
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-ch.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +typedef unsigned long BITMAP_WORD;
> +
> +bool
> +bmp_iter_set (BITMAP_WORD bits, unsigned *bit_no)
> +{
> + /* If our current word is nonzero, it contains the bit we want. */
> + if (bits)
> + {
> + while (!(bits & 1))
> + {
> + bits >>= 1;
> + *bit_no += 1;
> + }
> + return true;
> + }
> +
> + return false;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "__builtin_ctz|\\.CTZ" 1 "optimized" }
> } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-char.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ctz-char.c
> index 3cd166acbd4..fa8b7f39de4 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-char.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-char.c
> @@ -1,6 +1,6 @@
> /* { dg-do run } */
> /* { dg-require-effective-target ctz } */
> -/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>
> #define PREC (__CHAR_BIT__)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c
> index b9afe8852d8..5ebc3213169 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-char.c
> @@ -1,6 +1,6 @@
> /* { dg-do run } */
> /* { dg-require-effective-target ctz } */
> -/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
>
> #define PREC (__CHAR_BIT__)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c
> index d2702a65daf..0ce4b6beaa7 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-int.c
> @@ -1,6 +1,6 @@
> /* { dg-do run } */
> /* { dg-require-effective-target ctz } */
> -/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
>
> #define PREC (__CHAR_BIT__ * __SIZEOF_INT__)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c
> index 1ea0d5d7d9f..f98bec039b3 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long-long.c
> @@ -1,6 +1,6 @@
> /* { dg-do run } */
> /* { dg-require-effective-target ctzll } */
> -/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
>
> #define PREC (__CHAR_BIT__ * __SIZEOF_LONG_LONG__)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c
> index 80fb02dcfa6..8edb3728131 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-complement-long.c
> @@ -1,6 +1,6 @@
> /* { dg-do run } */
> /* { dg-require-effective-target ctzl } */
> -/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
>
> #define PREC (__CHAR_BIT__ * __SIZEOF_LONG__)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-int.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ctz-int.c
> index 7f63493eb73..2bf3ae69b93 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-int.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-int.c
> @@ -1,6 +1,6 @@
> /* { dg-do run } */
> /* { dg-require-effective-target ctz } */
> -/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>
> #define PREC (__CHAR_BIT__ * __SIZEOF_INT__)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-long-long.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ctz-long-long.c
> index 924f61b76f0..2e159485cb9 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-long-long.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-long-long.c
> @@ -1,6 +1,6 @@
> /* { dg-do run } */
> /* { dg-require-effective-target ctzll } */
> -/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>
> #define PREC (__CHAR_BIT__ * __SIZEOF_LONG_LONG__)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ctz-long.c
> b/gcc/testsuite/gcc.dg/tree-ssa/ctz-long.c
> index 178945daa8a..2e3be652a0b 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ctz-long.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ctz-long.c
> @@ -1,6 +1,6 @@
> /* { dg-do run } */
> /* { dg-require-effective-target ctzl } */
> -/* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
>
> #define PREC (__CHAR_BIT__ * __SIZEOF_LONG__)
>
> diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> index 6e130862549..9d62260540d 100644
> --- a/gcc/tree-ssa-loop-niter.cc
> +++ b/gcc/tree-ssa-loop-niter.cc
> @@ -45,6 +45,7 @@ along with GCC; see the file COPYING3. If not see
> #include "tree-dfa.h"
> #include "internal-fn.h"
> #include "gimple-range.h"
> +#include "gimple-range-path.h"
> #include "sreal.h"
>
>
> @@ -2321,6 +2322,54 @@ is_rshift_by_1 (gassign *stmt)
> return false;
> }
>
> +/* Helper for number_of_iterations_cltz that uses path-based ranger to
> + determine if SRC, shifted left (when LEFT_SHIFT is true) or right
> + by NUM_IGNORED_BITS, is guaranteed to be != 0 inside LOOP.
> + Return true if so or false otherwise. */
> +
> +static bool
> +shifted_range_nonzero_p (loop_p loop, tree src,
> + bool left_shift, int num_ignored_bits)
> +{
> + int_range_max r (TREE_TYPE (src));
> + gcc_assert (num_ignored_bits >= 0);
> +
> + auto_vec<basic_block, 2> path;
> + path.safe_push (loop->header);
> + path.safe_push (loop_preheader_edge (loop)->src);
> +
> + bool range_nonzero = false;
> + gimple_ranger ranger;
> + path_range_query query (ranger, path);
> +
> + if (query.range_of_expr (r, src)
> + && !r.varying_p ()
> + && !r.undefined_p ())
> + {
> + if (num_ignored_bits)
> + {
> + range_op_handler op (left_shift ? LSHIFT_EXPR : RSHIFT_EXPR);
> + int_range_max shifted_range (TREE_TYPE (src));
> + wide_int shift_count = wi::shwi (num_ignored_bits,
> + TYPE_PRECISION (TREE_TYPE
> + (src)));
> + int_range_max shift_amount
> + (TREE_TYPE (src), shift_count, shift_count);
> +
> + if (op.fold_range (shifted_range, TREE_TYPE (src), r,
> + shift_amount))
> + r = shifted_range;
> + }
> +
> + /* If the range does not contain zero we are good. */
> + if (!range_includes_zero_p (r))
> + range_nonzero = true;
> + }
> +
> + return range_nonzero;
> +}
> +
> +
> /* See comment below for number_of_iterations_bitcount.
> For c[lt]z, we have:
>
> @@ -2438,6 +2487,9 @@ number_of_iterations_cltz (loop_p loop, edge exit,
> tree src = gimple_phi_arg_def (phi, loop_preheader_edge (loop)->dest_idx);
> int src_precision = TYPE_PRECISION (TREE_TYPE (src));
>
> + /* Save the original SSA name before preprocessing for ranger queries. */
> + tree unshifted_src = src;
> +
> /* Apply any needed preprocessing to src. */
> int num_ignored_bits;
> if (left_shift)
> @@ -2463,10 +2515,56 @@ number_of_iterations_cltz (loop_p loop, edge exit,
>
> expr = fold_convert (unsigned_type_node, expr);
>
> - tree assumptions = fold_build2 (NE_EXPR, boolean_type_node, src,
> - build_zero_cst (TREE_TYPE (src)));
> + /* If the copy-header (ch) pass peeled one iteration we're shifting
> + SRC by preprocessing it above.
> +
> + A loop like
> + if (bits)
> + {
> + while (!(bits & 1))
> + {
> + bits >>= 1;
> + cnt += 1;
> + }
> + return cnt;
> + }
> + ch (roughly) transforms into:
> + if (bits)
> + {
> + if (!(bits & 1)
> + {
> + do
> + {
> + bits >>= 1;
> + cnt += 1;
> + } while (!(bits & 1));
> + }
> + else
> + cnt = 1;
> + return cnt;
> + }
> +
> + Then, our preprocessed SRC (that is used for c[tl]z computation)
> + will be bits >> 1, and the assumption is bits >> 1 != 0. */
> +
> + /* As we don't have a gimple statement to query, use a path-based range
> + query from the preheader to the loop header to get the range of the
> + unshifted SRC, then apply the shift manually and check if the
> + resulting range is != 0. */
> + tree assumptions;
> + if (shifted_range_nonzero_p (loop, unshifted_src,
> + left_shift, num_ignored_bits))
> + assumptions = boolean_true_node;
> + else
> + {
> + /* If ranger couldn't prove the assumption, try
> + simplify_using_initial_conditions. */
> + assumptions = fold_build2 (NE_EXPR, boolean_type_node, src,
> + build_zero_cst (TREE_TYPE (src)));
> + assumptions = simplify_using_initial_conditions (loop, assumptions);
> + }
>
> - niter->assumptions = simplify_using_initial_conditions (loop, assumptions);
> + niter->assumptions = assumptions;
> niter->may_be_zero = boolean_false_node;
> niter->niter = simplify_using_initial_conditions (loop, expr);
>
> --
> 2.51.0
>