On Tue, Aug 16, 2022 at 11:18 AM Richard Biener <rguent...@suse.de> wrote:
>
> On Mon, 15 Aug 2022, Aldy Hernandez wrote:
>
> > On Mon, Aug 15, 2022 at 9:24 PM Andrew MacLeod <amacl...@redhat.com> wrote:
> > >
> > > heh. or just
> > >
> > >
> > > +      int_range<2> r;
> > > +      if (!fold_range (r, const_cast <gcond *> (cond_stmt))
> > > +      || !r.singleton_p (&val))
> > >
> > >
> > > if you do not provide a range_query to any of the fold_using_range code,
> > > it defaults to:
> > >
> > > fur_source::fur_source (range_query *q)
> > > {
> > >    if (q)
> > >      m_query = q;
> > >    else if (cfun)
> > >      m_query = get_range_query (cfun);
> > >    else
> > >      m_query = get_global_range_query ();
> > >    m_gori = NULL;
> > > }
> > >
> >
> > Sweet.  Even better!
>
> So when I do the following incremental change ontop of the posted
> patch then I see that the path query is able to simplify more
> "single BB paths" than the global range folding.
>
> diff --git a/gcc/tree-ssa-threadbackward.cc
> b/gcc/tree-ssa-threadbackward.cc
> index 669098e4ec3..777e778037f 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -314,6 +314,12 @@ back_threader::find_taken_edge_cond (const
> vec<basic_block> &path,
>  {
>    int_range_max r;
>
> +  int_range<2> rf;
> +  if (path.length () == 1)
> +    {
> +      fold_range (rf, cond);
> +    }
> +
>    m_solver->compute_ranges (path, m_imports);
>    m_solver->range_of_stmt (r, cond);
>
> @@ -325,6 +331,8 @@ back_threader::find_taken_edge_cond (const
> vec<basic_block> &path,
>
>    if (r == true_range || r == false_range)
>      {
> +      if (path.length () == 1)
> +       gcc_assert  (r == rf);
>        edge e_true, e_false;
>        basic_block bb = gimple_bb (cond);
>        extract_true_false_edges_from_block (bb, &e_true, &e_false);
>
> Even doing the following (not sure what's the difference and in
> particular expense over the path range query) results in missed
> simplifications (checking my set of cc1files).
>
> diff --git a/gcc/tree-ssa-threadbackward.cc
> b/gcc/tree-ssa-threadbackward.cc
> index 669098e4ec3..1d43a179d08 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -99,6 +99,7 @@ private:
>
>    back_threader_registry m_registry;
>    back_threader_profitability m_profit;
> +  gimple_ranger *m_ranger;
>    path_range_query *m_solver;
>
>    // Current path being analyzed.
> @@ -146,12 +147,14 @@ back_threader::back_threader (function *fun,
> unsigned flags, bool first)
>    // The path solver needs EDGE_DFS_BACK in resolving mode.
>    if (flags & BT_RESOLVE)
>      mark_dfs_back_edges ();
> -  m_solver = new path_range_query (flags & BT_RESOLVE);
> +  m_ranger = new gimple_ranger;
> +  m_solver = new path_range_query (flags & BT_RESOLVE, m_ranger);
>  }

Passing an allocated ranger here results in less simplifications over
letting path_range_query allocate its own?  That's not right.  Or do
you mean that using fold_range() with the m_ranger causes ICEs with
your patch (due to the non-null processing described below)?

>
>  back_threader::~back_threader ()
>  {
>    delete m_solver;
> +  delete m_ranger;
>
>    loop_optimizer_finalize ();
>  }
> @@ -314,6 +317,12 @@ back_threader::find_taken_edge_cond (const
> vec<basic_block> &path,
>  {
>    int_range_max r;
>
> +  int_range<2> rf;
> +  if (path.length () == 1)
> +    {
> +      fold_range (rf, cond, m_ranger);
> +    }
> +
>    m_solver->compute_ranges (path, m_imports);
>    m_solver->range_of_stmt (r, cond);
>
> @@ -325,6 +334,8 @@ back_threader::find_taken_edge_cond (const
> vec<basic_block> &path,
>
>    if (r == true_range || r == false_range)
>      {
> +      if (path.length () == 1)
> +       gcc_assert  (r == rf);
>        edge e_true, e_false;
>        basic_block bb = gimple_bb (cond);
>        extract_true_false_edges_from_block (bb, &e_true, &e_false);
>
> one example is
>
> <bb 176> [local count: 14414059]:
> _128 = node_177(D)->typed.type;
> pretmp_413 = MEM[(const union tree_node *)_128].base.code;
> _431 = pretmp_413 + 65519;
> if (_128 == 0B)
>   goto <bb 199>; [18.09%]
> else
>   goto <bb 177>; [81.91%]
>
> where m_imports for the path is just _128 and the range computed is
> false while the ranger query returns VARYING.  But
> path_range_query::range_defined_in_block does
>
>   if (bb && POINTER_TYPE_P (TREE_TYPE (name)))
>     m_ranger->m_cache.m_exit.maybe_adjust_range (r, name, bb);
>
> which adjusts the range to ~[0, 0], probably because of the
> dereference in the following stmt.
>
> Why does fold_range not do this when folding the exit test?  Is there
> a way to make it do so?  It looks like the only routine using this
> in gimple-range.cc is range_on_edge and there it's used for e->src
> after calling range_on_exit for e->src (why's it not done in
> range_on_exit?).  A testcase for this is

Andrew's gonna have to answer this one, because I'm just a user of the
infer_range infrastructure.  But yes, you're right... fold_range
doesn't seem to take into account side-effects such as non-null.

Aldy

>
> int foo (int **p, int i)
> {
>   int *q = *p;
>   int res = *q + i;
>   if (q)
>     return res;
>   return -1;
> }
>
> which we "thread" with the path and with the above ICEs because
> fold_range doesn't get that if (q) is always true.  Without the
> patch ethread doesn't want to duplicate the block (it's too large)
> but threadfull will if you disable evrp (if you remove the increment
> by 'i' it again won't since nothing interesting prevails and it
> won't go to BB 0 and fails to pick up a thread of length > 1):
>
> Checking profitability of path (backwards):  bb:2 (6 insns) bb:0
>   Control statement insns: 2
>   Overall: 4 insns
>   [1] Registering jump thread: (0, 2) incoming edge;  (2, 3) nocopy;
> path: 0->2->3 SUCCESS
> Removing basic block 2
> ;; basic block 2, loop depth 0
> ;;  pred:
> _1 = *p_6(D);
> _2 = (long unsigned int) n_7(D);
> _3 = _2 * 4;
> q_8 = _1 + _3;
> res_9 = *q_8;
> if (q_8 != 0B)
>   goto <bb 3>; [98.33%]
> else
>   goto <bb 4>; [1.67%]
> ;;  succ:       3
> ;;              4
>
> ... (it copied BB 2) ...
>
> int foo (int * * p, int n)
> {
>   int res;
>   int * q;
>   int * _10;
>   long unsigned int _11;
>   long unsigned int _12;
>
>   <bb 2> [local count: 1073741824]:
>   _10 = *p_6(D);
>   _11 = (long unsigned int) n_7(D);
>   _12 = _11 * 4;
>   q_13 = _10 + _12;
>   res_14 = *q_13;
>   return res_14;
>

Reply via email to