on 2021/8/3 下午8:08, Richard Biener wrote: > On Fri, Jul 30, 2021 at 7:20 AM Kewen.Lin <li...@linux.ibm.com> wrote: >> >> on 2021/7/29 下午4:01, Richard Biener wrote: >>> On Fri, Jul 23, 2021 at 10:41 AM Kewen.Lin <li...@linux.ibm.com> wrote: >>>> >>>> on 2021/7/22 下午8:56, Richard Biener wrote: >>>>> On Tue, Jul 20, 2021 at 4:37 >>>>> PM Kewen.Lin <li...@linux.ibm.com> wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> This v2 has addressed some review comments/suggestions: >>>>>> >>>>>> - Use "!=" instead of "<" in function operator!= (const Iter &rhs) >>>>>> - Add new CTOR loops_list (struct loops *loops, unsigned flags) >>>>>> to support loop hierarchy tree rather than just a function, >>>>>> and adjust to use loops* accordingly. >>>>> >>>>> I actually meant struct loop *, not struct loops * ;) At the point >>>>> we pondered to make loop invariant motion work on single >>>>> loop nests we gave up not only but also because it iterates >>>>> over the loop nest but all the iterators only ever can process >>>>> all loops, not say, all loops inside a specific 'loop' (and >>>>> including that 'loop' if LI_INCLUDE_ROOT). So the >>>>> CTOR would take the 'root' of the loop tree as argument. >>>>> >>>>> I see that doesn't trivially fit how loops_list works, at least >>>>> not for LI_ONLY_INNERMOST. But I guess FROM_INNERMOST >>>>> could be adjusted to do ONLY_INNERMOST as well? >>>>> >>>> >>>> >>>> Thanks for the clarification! I just realized that the previous >>>> version with struct loops* is problematic, all traversal is >>>> still bounded with outer_loop == NULL. I think what you expect >>>> is to respect the given loop_p root boundary. Since we just >>>> record the loops' nums, I think we still need the function* fn? >>> >>> Would it simplify things if we recorded the actual loop *? >>> >> >> I'm afraid it's unsafe to record the loop*. I had the same >> question why the loop iterator uses index rather than loop* when >> I read this at the first time. I guess the design of processing >> loops allows its user to update or even delete the folllowing >> loops to be visited. For example, when the user does some tricks >> on one loop, then it duplicates the loop and its children to >> somewhere and then removes the loop and its children, when >> iterating onto its children later, the "index" way will check its >> validity by get_loop at that point, but the "loop *" way will >> have some recorded pointers to become dangling, can't do the >> validity check on itself, seems to need a side linear search to >> ensure the validity. >> >>> There's still the to_visit reserve which needs a bound on >>> the number of loops for efficiency reasons. >>> >> >> Yes, I still keep the fn in the updated version. >> >>>> So I add one optional argument loop_p root and update the >>>> visiting codes accordingly. Before this change, the previous >>>> visiting uses the outer_loop == NULL as the termination condition, >>>> it perfectly includes the root itself, but with this given root, >>>> we have to use it as the termination condition to avoid to iterate >>>> onto its possible existing next. >>>> >>>> For LI_ONLY_INNERMOST, I was thinking whether we can use the >>>> code like: >>>> >>>> struct loops *fn_loops = loops_for_fn (fn)->larray; >>>> for (i = 0; vec_safe_iterate (fn_loops, i, &aloop); i++) >>>> if (aloop != NULL >>>> && aloop->inner == NULL >>>> && flow_loop_nested_p (tree_root, aloop)) >>>> this->to_visit.quick_push (aloop->num); >>>> >>>> it has the stable bound, but if the given root only has several >>>> child loops, it can be much worse if there are many loops in fn. >>>> It seems impossible to predict the given root loop hierarchy size, >>>> maybe we can still use the original linear searching for the case >>>> loops_for_fn (fn) == root? But since this visiting seems not so >>>> performance critical, I chose to share the code originally used >>>> for FROM_INNERMOST, hope it can have better readability and >>>> maintainability. >>> >>> I was indeed looking for something that has execution/storage >>> bound on the subtree we're interested in. If we pull the CTOR >>> out-of-line we can probably keep the linear search for >>> LI_ONLY_INNERMOST when looking at the whole loop tree. >>> >> >> OK, I've moved the suggested single loop tree walker out-of-line >> to cfgloop.c, and brought the linear search back for >> LI_ONLY_INNERMOST when looking at the whole loop tree. >> >>> It just seemed to me that we can eventually re-use a >>> single loop tree walker for all orders, just adjusting the >>> places we push. >>> >> >> Wow, good point! Indeed, I have further unified all orders >> handlings into a single function walk_loop_tree. >> >>>> >>>> Bootstrapped and regtested on powerpc64le-linux-gnu P9, >>>> x86_64-redhat-linux and aarch64-linux-gnu, also >>>> bootstrapped on ppc64le P9 with bootstrap-O3 config. >>>> >>>> Does the attached patch meet what you expect? >>> >>> So yeah, it's probably close to what is sensible. Not sure >>> whether optimizing the loops for the !only_push_innermost_p >>> case is important - if we manage to produce a single >>> walker with conditionals based on 'flags' then IPA-CP should >>> produce optimal clones as well I guess. >>> >> >> Thanks for the comments, the updated v2 is attached. >> Comparing with v1, it does: >> >> - Unify one single loop tree walker for all orders. >> - Move walk_loop_tree out-of-line to cfgloop.c. >> - Keep the linear search for LI_ONLY_INNERMOST with >> tree_root of fn loops. >> - Use class loop * instead of loop_p. >> >> Bootstrapped & regtested on powerpc64le-linux-gnu Power9 >> (with/without the hunk for LI_ONLY_INNERMOST linear search, >> it can have the coverage to exercise LI_ONLY_INNERMOST >> in walk_loop_tree when "without"). >> >> Is it ok for trunk? > > Looks good to me. I think that the 'mn' was an optimization > for the linear walk and it's cheaper to pointer test against > the actual 'root' loop (no need to dereference). Thus > > + if (flags & LI_ONLY_INNERMOST && tree_root == loops->tree_root) > { > - for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, &aloop); > i++) > + class loop *aloop; > + unsigned int i; > + for (i = 0; vec_safe_iterate (loops->larray, i, &aloop); i++) > if (aloop != NULL > && aloop->inner == NULL > - && aloop->num >= mn) > + && aloop->num != mn) > this->to_visit.quick_push (aloop->num); > > could elide the aloop->num != mn check and start iterating from 1, > since loops->tree_root->num == 0 > > and the walk_loop_tree could simply do > > class loop *exclude = flags & LI_INCLUDE_ROOT ? NULL : root; > > and pointer test aloop against exclude. That avoids the idea that > 'mn' is a vehicle to exclude one random loop from the iteration. >
Good idea! Thanks for the comments! The attached v3 has addressed the review comments on "mn". Bootstrapped & regtested again on powerpc64le-linux-gnu Power9 (with/without the hunk for LI_ONLY_INNERMOST linear search). Is it ok for trunk? BR, Kewen ----- gcc/ChangeLog: * cfgloop.h (loops_list::loops_list): Add one optional argument root and adjust accordingly, update loop tree walking and factor out to ... * cfgloop.c (loops_list::walk_loop_tree): ...this. New function.
--- gcc/cfgloop.c | 65 ++++++++++++++++++++++++++++++++ gcc/cfgloop.h | 100 ++++++++++++++++++++------------------------------ 2 files changed, 105 insertions(+), 60 deletions(-) diff --git a/gcc/cfgloop.c b/gcc/cfgloop.c index 6284ae292b6..afbaa216ce5 100644 --- a/gcc/cfgloop.c +++ b/gcc/cfgloop.c @@ -2104,3 +2104,68 @@ mark_loop_for_removal (loop_p loop) loop->latch = NULL; loops_state_set (LOOPS_NEED_FIXUP); } + +/* Starting from loop tree ROOT, walk loop tree as the visiting + order specified by FLAGS. The supported visiting orders + are: + - LI_ONLY_INNERMOST + - LI_FROM_INNERMOST + - Preorder (if neither of above is specified) */ + +void +loops_list::walk_loop_tree (class loop *root, unsigned flags) +{ + bool only_innermost_p = flags & LI_ONLY_INNERMOST; + bool from_innermost_p = flags & LI_FROM_INNERMOST; + bool preorder_p = !(only_innermost_p || from_innermost_p); + class loop *exclude = flags & LI_INCLUDE_ROOT ? NULL : root; + + /* Early handle root without any inner loops, make later + processing simpler, that is all loops processed in the + following while loop are impossible to be root. */ + if (!root->inner) + { + if (root != exclude) + this->to_visit.quick_push (root->num); + return; + } + + class loop *aloop; + for (aloop = root; + aloop->inner != NULL; + aloop = aloop->inner) + { + if (preorder_p && aloop != exclude) + this->to_visit.quick_push (aloop->num); + continue; + } + + while (1) + { + gcc_assert (aloop != root); + if (from_innermost_p || aloop->inner == NULL) + this->to_visit.quick_push (aloop->num); + + if (aloop->next) + { + for (aloop = aloop->next; + aloop->inner != NULL; + aloop = aloop->inner) + { + if (preorder_p) + this->to_visit.quick_push (aloop->num); + continue; + } + } + else if (loop_outer (aloop) == root) + break; + else + aloop = loop_outer (aloop); + } + + /* When visiting from innermost, we need to consider root here + since the previous while loop doesn't handle it. */ + if (from_innermost_p && root != exclude) + this->to_visit.quick_push (root->num); +} + diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h index d5eee6b4840..fed2b0daf4b 100644 --- a/gcc/cfgloop.h +++ b/gcc/cfgloop.h @@ -669,13 +669,15 @@ as_const (T &t) } /* A list for visiting loops, which contains the loop numbers instead of - the loop pointers. The scope is restricted in function FN and the - visiting order is specified by FLAGS. */ + the loop pointers. If the loop ROOT is offered (non-null), the visiting + will start from it, otherwise it would start from the tree_root of + loops_for_fn (FN) instead. The scope is restricted in function FN and + the visiting order is specified by FLAGS. */ class loops_list { public: - loops_list (function *fn, unsigned flags); + loops_list (function *fn, unsigned flags, class loop *root = nullptr); template <typename T> class Iter { @@ -750,6 +752,10 @@ public: } private: + /* Walk loop tree starting from ROOT as the visiting order specified + by FLAGS. */ + void walk_loop_tree (class loop *root, unsigned flags); + /* The function we are visiting. */ function *fn; @@ -782,76 +788,50 @@ loops_list::Iter<T>::fill_curr_loop () } /* Set up the loops list to visit according to the specified - function scope FN and iterating order FLAGS. */ + function scope FN and iterating order FLAGS. If ROOT is + not null, the visiting would start from it, otherwise it + will start from tree_root of loops_for_fn (FN). */ -inline loops_list::loops_list (function *fn, unsigned flags) +inline loops_list::loops_list (function *fn, unsigned flags, class loop *root) { - class loop *aloop; - unsigned i; - int mn; + struct loops *loops = loops_for_fn (fn); + gcc_assert (!root || loops); + + /* Check mutually exclusive flags should not co-exist. */ + unsigned checked_flags = LI_ONLY_INNERMOST | LI_FROM_INNERMOST; + gcc_assert ((flags & checked_flags) != checked_flags); this->fn = fn; - if (!loops_for_fn (fn)) + if (!loops) return; + class loop *tree_root = root ? root : loops->tree_root; + this->to_visit.reserve_exact (number_of_loops (fn)); - mn = (flags & LI_INCLUDE_ROOT) ? 0 : 1; - if (flags & LI_ONLY_INNERMOST) + /* When root is tree_root of loops_for_fn (fn) and the visiting + order is LI_ONLY_INNERMOST, we would like to use linear + search here since it has a more stable bound than the + walk_loop_tree. */ + if (flags & LI_ONLY_INNERMOST && tree_root == loops->tree_root) { - for (i = 0; vec_safe_iterate (loops_for_fn (fn)->larray, i, &aloop); i++) - if (aloop != NULL - && aloop->inner == NULL - && aloop->num >= mn) - this->to_visit.quick_push (aloop->num); - } - else if (flags & LI_FROM_INNERMOST) - { - /* Push the loops to LI->TO_VISIT in postorder. */ - for (aloop = loops_for_fn (fn)->tree_root; - aloop->inner != NULL; - aloop = aloop->inner) - continue; - - while (1) + gcc_assert (tree_root->num == 0); + if (tree_root->inner == NULL) { - if (aloop->num >= mn) - this->to_visit.quick_push (aloop->num); - - if (aloop->next) - { - for (aloop = aloop->next; - aloop->inner != NULL; - aloop = aloop->inner) - continue; - } - else if (!loop_outer (aloop)) - break; - else - aloop = loop_outer (aloop); + if (flags & LI_INCLUDE_ROOT) + this->to_visit.quick_push (0); + + return; } + + class loop *aloop; + unsigned int i; + for (i = 1; vec_safe_iterate (loops->larray, i, &aloop); i++) + if (aloop != NULL && aloop->inner == NULL) + this->to_visit.quick_push (aloop->num); } else - { - /* Push the loops to LI->TO_VISIT in preorder. */ - aloop = loops_for_fn (fn)->tree_root; - while (1) - { - if (aloop->num >= mn) - this->to_visit.quick_push (aloop->num); - - if (aloop->inner != NULL) - aloop = aloop->inner; - else - { - while (aloop != NULL && aloop->next == NULL) - aloop = loop_outer (aloop); - if (aloop == NULL) - break; - aloop = aloop->next; - } - } - } + walk_loop_tree (tree_root, flags); } /* The properties of the target. */ -- 2.27.0