On Wed, Aug 4, 2021 at 12:47 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>
> on 2021/8/4 下午6:01, Richard Biener wrote:
> > On Wed, Aug 4, 2021 at 4:36 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>
> >> on 2021/8/3 下午8:08, Richard Biener wrote:
> >>> On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>>>
> >>>> on 2021/7/29 下午4:01, Richard Biener wrote:
> >>>>> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>>>>>
> >>>>>> on 2021/7/22 下午8:56, Richard Biener wrote:
> >>>>>>> On Tue, Jul 20, 2021 at 4:37
> >>>>>>> PM Kewen.Lin <li...@linux.ibm.com> wrote:
> >>>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> This v2 has addressed some review comments/suggestions:
> >>>>>>>>
> >>>>>>>>   - Use "!=" instead of "<" in function operator!= (const Iter &rhs)
> >>>>>>>>   - Add new CTOR loops_list (struct loops *loops, unsigned flags)
> >>>>>>>>     to support loop hierarchy tree rather than just a function,
> >>>>>>>>     and adjust to use loops* accordingly.
> >>>>>>>
> >>>>>>> I actually meant struct loop *, not struct loops * ;)  At the point
> >>>>>>> we pondered to make loop invariant motion work on single
> >>>>>>> loop nests we gave up not only but also because it iterates
> >>>>>>> over the loop nest but all the iterators only ever can process
> >>>>>>> all loops, not say, all loops inside a specific 'loop' (and
> >>>>>>> including that 'loop' if LI_INCLUDE_ROOT).  So the
> >>>>>>> CTOR would take the 'root' of the loop tree as argument.
> >>>>>>>
> >>>>>>> I see that doesn't trivially fit how loops_list works, at least
> >>>>>>> not for LI_ONLY_INNERMOST.  But I guess FROM_INNERMOST
> >>>>>>> could be adjusted to do ONLY_INNERMOST as well?
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Thanks for the clarification!  I just realized that the previous
> >>>>>> version with struct loops* is problematic, all traversal is
> >>>>>> still bounded with outer_loop == NULL.  I think what you expect
> >>>>>> is to respect the given loop_p root boundary.  Since we just
> >>>>>> record the loops' nums, I think we still need the function* fn?
> >>>>>
> >>>>> Would it simplify things if we recorded the actual loop *?
> >>>>>
> >>>>
> >>>> I'm afraid it's unsafe to record the loop*.  I had the same
> >>>> question why the loop iterator uses index rather than loop* when
> >>>> I read this at the first time.  I guess the design of processing
> >>>> loops allows its user to update or even delete the folllowing
> >>>> loops to be visited.  For example, when the user does some tricks
> >>>> on one loop, then it duplicates the loop and its children to
> >>>> somewhere and then removes the loop and its children, when
> >>>> iterating onto its children later, the "index" way will check its
> >>>> validity by get_loop at that point, but the "loop *" way will
> >>>> have some recorded pointers to become dangling, can't do the
> >>>> validity check on itself, seems to need a side linear search to
> >>>> ensure the validity.
> >>>>
> >>>>> There's still the to_visit reserve which needs a bound on
> >>>>> the number of loops for efficiency reasons.
> >>>>>
> >>>>
> >>>> Yes, I still keep the fn in the updated version.
> >>>>
> >>>>>> So I add one optional argument loop_p root and update the
> >>>>>> visiting codes accordingly.  Before this change, the previous
> >>>>>> visiting uses the outer_loop == NULL as the termination condition,
> >>>>>> it perfectly includes the root itself, but with this given root,
> >>>>>> we have to use it as the termination condition to avoid to iterate
> >>>>>> onto its possible existing next.
> >>>>>>
> >>>>>> For LI_ONLY_INNERMOST, I was thinking whether we can use the
> >>>>>> code like:
> >>>>>>
> >>>>>>     struct loops *fn_loops = loops_for_fn (fn)->larray;
> >>>>>>     for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++)
> >>>>>>         if (aloop != NULL
> >>>>>>             && aloop->inner == NULL
> >>>>>>             && flow_loop_nested_p (tree_root, aloop))
> >>>>>>              this->to_visit.quick_push (aloop->num);
> >>>>>>
> >>>>>> it has the stable bound, but if the given root only has several
> >>>>>> child loops, it can be much worse if there are many loops in fn.
> >>>>>> It seems impossible to predict the given root loop hierarchy size,
> >>>>>> maybe we can still use the original linear searching for the case
> >>>>>> loops_for_fn (fn) == root?  But since this visiting seems not so
> >>>>>> performance critical, I chose to share the code originally used
> >>>>>> for FROM_INNERMOST, hope it can have better readability and
> >>>>>> maintainability.
> >>>>>
> >>>>> I was indeed looking for something that has execution/storage
> >>>>> bound on the subtree we're interested in.  If we pull the CTOR
> >>>>> out-of-line we can probably keep the linear search for
> >>>>> LI_ONLY_INNERMOST when looking at the whole loop tree.
> >>>>>
> >>>>
> >>>> OK, I've moved the suggested single loop tree walker out-of-line
> >>>> to cfgloop.c, and brought the linear search back for
> >>>> LI_ONLY_INNERMOST when looking at the whole loop tree.
> >>>>
> >>>>> It just seemed to me that we can eventually re-use a
> >>>>> single loop tree walker for all orders, just adjusting the
> >>>>> places we push.
> >>>>>
> >>>>
> >>>> Wow, good point!  Indeed, I have further unified all orders
> >>>> handlings into a single function walk_loop_tree.
> >>>>
> >>>>>>
> >>>>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9,
> >>>>>> x86_64-redhat-linux and aarch64-linux-gnu, also
> >>>>>> bootstrapped on ppc64le P9 with bootstrap-O3 config.
> >>>>>>
> >>>>>> Does the attached patch meet what you expect?
> >>>>>
> >>>>> So yeah, it's probably close to what is sensible.  Not sure
> >>>>> whether optimizing the loops for the !only_push_innermost_p
> >>>>> case is important - if we manage to produce a single
> >>>>> walker with conditionals based on 'flags' then IPA-CP should
> >>>>> produce optimal clones as well I guess.
> >>>>>
> >>>>
> >>>> Thanks for the comments, the updated v2 is attached.
> >>>> Comparing with v1, it does:
> >>>>
> >>>>   - Unify one single loop tree walker for all orders.
> >>>>   - Move walk_loop_tree out-of-line to cfgloop.c.
> >>>>   - Keep the linear search for LI_ONLY_INNERMOST with
> >>>>     tree_root of fn loops.
> >>>>   - Use class loop * instead of loop_p.
> >>>>
> >>>> Bootstrapped & regtested on powerpc64le-linux-gnu Power9
> >>>> (with/without the hunk for LI_ONLY_INNERMOST linear search,
> >>>> it can have the coverage to exercise LI_ONLY_INNERMOST
> >>>> in walk_loop_tree when "without").
> >>>>
> >>>> Is it ok for trunk?
> >>>
> >>> Looks good to me.  I think that the 'mn' was an optimization
> >>> for the linear walk and it's cheaper to pointer test against
> >>> the actual 'root' loop (no need to dereference).  Thus
> >>>
> >>> +  if (flags & LI_ONLY_INNERMOST && tree_root == loops->tree_root)
> >>>      {
> >>> -      for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, 
> >>> &aloop); i++)
> >>> +      class loop *aloop;
> >>> +      unsigned int i;
> >>> +      for (i = 0; vec_safe_iterate (loops->larray, i, &aloop); i++)
> >>>         if (aloop != NULL
> >>>             && aloop->inner == NULL
> >>> -           && aloop->num >= mn)
> >>> +           && aloop->num != mn)
> >>>           this->to_visit.quick_push (aloop->num);
> >>>
> >>> could elide the aloop->num != mn check and start iterating from 1,
> >>> since loops->tree_root->num == 0
> >>>
> >>> and the walk_loop_tree could simply do
> >>>
> >>>   class loop *exclude = flags & LI_INCLUDE_ROOT ? NULL : root;
> >>>
> >>> and pointer test aloop against exclude.  That avoids the idea that
> >>> 'mn' is a vehicle to exclude one random loop from the iteration.
> >>>
> >>
> >> Good idea!  Thanks for the comments!  The attached v3 has addressed
> >> the review comments on "mn".
> >>
> >> Bootstrapped & regtested again on powerpc64le-linux-gnu Power9
> >> (with/without the hunk for LI_ONLY_INNERMOST linear search).
> >>
> >> Is it ok for trunk?
> >
> > +  /* Early handle root without any inner loops, make later
> > +     processing simpler, that is all loops processed in the
> > +     following while loop are impossible to be root.  */
> > +  if (!root->inner)
> > +    {
> > +      if (root != exclude)
> > +       this->to_visit.quick_push (root->num);
> > +      return;
> > +    }
> >
> > could be
> >
> >    if (!root->inner)
> >      {
> >         if (flags & LI_INCLUDE_ROOT)
> >           this->to_visit.quick_push (root->num);
> >      }
> >
>
> OK, I thought wrongly that all places with "exclude" might be
> more consistent, so gave up to use flags directly.  :)
>
> > +  class loop *aloop;
> > +  for (aloop = root;
> > +       aloop->inner != NULL;
> > +       aloop = aloop->inner)
> > +    {
> > +      if (preorder_p && aloop != exclude)
> > +       this->to_visit.quick_push (aloop->num);
> > +      continue;
> > +    }
> >
> > could be
> >
> > +  class loop *aloop;
> > +  for (aloop = root->inner;
> > +       aloop->inner != NULL;
> > +       aloop = aloop->inner)
> > +    {
> > +      if (preorder_p)
> > +       this->to_visit.quick_push (aloop->num);
> > +      continue;
> > +    }
> >
>
> This seems wrong?  For preorder_p, we might miss to push root
> when root->inner isn't NULL.  The below "else if" makes it safe.

oops, yes.

> @@ -2125,17 +2125,19 @@ loops_list::walk_loop_tree (class loop *root, 
> unsigned flags)
>       following while loop are impossible to be root.  */
>    if (!root->inner)
>      {
> -      if (root != exclude)
> +      if (flags & LI_INCLUDE_ROOT)
>         this->to_visit.quick_push (root->num);
>        return;
>      }
> +  else if (preorder_p && flags & LI_INCLUDE_ROOT)
> +    this->to_visit.quick_push (root->num);
>
> > +  /* When visiting from innermost, we need to consider root here
> > +     since the previous while loop doesn't handle it.  */
> > +  if (from_innermost_p && root != exclude)
> > +    this->to_visit.quick_push (root->num);
> >
> > could be like the first.  I think that's more clear even.  Sorry for
> > finding a better solution again.
> >
>
> It's totally fine, thanks for all the nice suggestions!  :)
>
> > OK with that change
> >
>
> Thanks, the attached diff is the delta against v3, excepting for
> the "else if", the other changes follow the suggestion above.
>
> Could you have another look to confirm?

I'm missing the line that removes 'exclude', other than that it looks
OK.

Thanks,
Richard.

> I'll do the full testing again before committing.
>
> BR,
> Kewen

Reply via email to