On Wed, Dec 10, 2014 at 6:58 AM, Jeff Law <l...@redhat.com> wrote: > On 12/05/14 05:15, Bin Cheng wrote: >> >> Hi, >> Though PR62178 is hidden by recent cost change in aarch64 backend, the >> ivopt >> issue still exists. >> >> Current candidate selecting algorithm tends to select fewer candidates >> given >> below reasons: >> 1) to better handle loops with many induction uses but the best choice >> is >> one generic basic induction variable; >> 2) to keep compilation time low. >> >> One fundamental weakness of the strategy is the opposite situation can't >> be >> handled properly sometimes. For these cases the best choice is each >> induction variable has its own candidate. >> This patch fixes the problem by shuffling candidate set after fix-point is >> reached by current implementation. The reason why this strategy works is >> it >> replaces candidate set by selecting local optimal candidate for some >> induction uses, and the new candidate set (has lower cost) is exact what >> we >> want in the mentioned case. Instrumentation data shows this can find >> better >> candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64. >> >> This patch actually is extension to the first version patch posted at >> https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds >> another selecting pass with special seed set (more or less like the >> shuffled >> set in this patch). Data also confirms this patch can find optimal sets >> for >> most loops found by the first one, as well as optimal sets for many new >> loops. >> >> Bootstrap and test on x86_64, no regression on benchmarks. Bootstrap and >> test on aarch64. >> Since this patch only selects candidate set with lower cost, any >> regressions >> revealed are latent bugs of other components in GCC. >> I also collected GCC bootstrap time on x86_64, no regression either. >> Is this OK? >> >> 2014-12-03 Bin chengbin.ch...@arm.com >> >> PR tree-optimization/62178 >> * tree-ssa-loop-ivopts.c (iv_ca_replace): New function. >> (try_improve_iv_set): Shuffle candidates set in order to handle >> case in which candidate wrto each iv use should be selected. >> >> gcc/testsuite/ChangeLog >> 2014-12-03 Bin chengbin.ch...@arm.com >> >> PR tree-optimization/62178 >> * gcc.target/aarch64/pr62178.c: New test. >> >> >> pr62178-20141202.txt >> >> >> Index: gcc/tree-ssa-loop-ivopts.c >> =================================================================== >> --- gcc/tree-ssa-loop-ivopts.c (revision 217828) >> +++ gcc/tree-ssa-loop-ivopts.c (working copy) >> @@ -5718,6 +5718,85 @@ iv_ca_extend (struct ivopts_data *data, struct iv_ >> return cost; >> } >> >> +/* Try replacing candidates in IVS which are recorded by list ACT_DELTA >> to >> + lower cost candidates. CAND is the one won't be replaced. >> Replacement >> + of candidate is recorded in list DELTA. */
Thanks for the review. > > Is this better written as: > > Try replacing candidates in IVS with IVs related to CAND (which is not > changed) if doing so lowers the IV cost. ACT_DELTA is the recorded > list of candidates. Replacement of candidate is recorded in list DELTA. > > ? Will refine that. > > >> + if (data->consider_all_candidates) >> + { >> + for (j = 0; j < n_iv_cands (data); j++) >> + { >> + if (j == old_cp->cand->id) >> + continue; >> + >> + cnd = iv_cand (data, j); >> + cp = get_use_iv_cost (data, use, cnd); >> + if (!cp) >> + continue; >> + >> + if (best_cp == NULL || cheaper_cost_pair (cp, best_cp)) >> + best_cp = cp; >> + } >> + } >> + else >> + { >> + EXECUTE_IF_SET_IN_BITMAP (use->related_cands, 0, j, bi) >> + { >> + if (j == old_cp->cand->id) >> + continue; >> + >> + cnd = iv_cand (data, j); >> + cp = get_use_iv_cost (data, use, cnd); >> + if (!cp) >> + continue; >> + >> + if (best_cp == NULL || cheaper_cost_pair (cp, best_cp)) >> + best_cp = cp; >> + } >> + } > > The loop bodies here are duplicated. Can you factor them into a function so > that this looks something like > > if (data->consider_all_candidates) > { > for (...) > refactored_code (some arguments) > } > else > { > EXECUTE_IF_SET_IN_BITMAP (...) > refactored_code (some arguments) > Will do that. >> @@ -6042,8 +6121,50 @@ try_improve_iv_set (struct ivopts_data *data, stru >> /* Try removing the candidates from the set instead. */ >> best_cost = iv_ca_prune (data, ivs, NULL, &best_delta); >> >> - /* Nothing more we can do. */ >> if (!best_delta) >> + { >> + /* So far candidate selecting algorithm tends to choose fewer >> IVs >> + so that it can handle cases in which loops have many >> variables >> + but the best choice is often to use only one general biv. >> One >> + weakness is it can't handle opposite cases, in which >> different >> + candidates should be chosen with respect to each use. To >> solve >> + the problem, we replace candidate of some uses with lower >> cost >> + one, thus general algorithm can have a chance to find optimal >> + set for these cases. */ > > So, in essence we've computed a best cost with "minimal" IVs and you're > using that result as an initial state for expanding the IV set. You expand > on a candidate by candidate basis if and only if the estimated cost lowers. > Right? Yes. > > At each expansion step you keep a single candidate fixed and you try to > derive other IVs from that fixed IV if doing so lowers the cost? Right? More or less, it like below process: Candidates set IVS (fixed point by current algorithm) --> Try extend IVS with a new one CAND if it has lower local cost for any uses(There will be a candidate set IVS', candidates in this subset of IVS are replaced with CAND for some uses) --> Replace rest candidates in IVS if they are in the set of IVS' with any other candidates having lower local cost) --> Prune the new set. The first/last steps are actually from current implementation, I added the middle step. Generally yes, it expands fixed point got by current algorithm to see if there is lower cost choice. Maybe the comment wasn't clear enough, I will see how to refine it. > > > That's what it looks like to me when reading the code/comments. Just want > to make sure it's working the way I think it is. > > FWIW, if I read the slides correctly, GCC's IV code was called out as being > one of the reasons GCC is generating better code for AArch64 than LLVM at a > recent LLVM conference. Glad to see that we're called out in that way and > that we continue to try to improve in that space as well. > > Jeff I don't know what to say... IVOPT doesn't work very well on aarch64 and I have a not short todo list for it. Of course some of them are target indenpendent. Thanks, bin