On Tue, May 13, 2014 at 4:59 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Sun, May 11, 2014 at 2:49 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Thu, May 8, 2014 at 5:08 PM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>> On Tue, May 6, 2014 at 6:44 PM, Richard Biener >>> <richard.guent...@gmail.com> wrote: >>>> On Tue, May 6, 2014 at 10:39 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >>>>> On Fri, Dec 6, 2013 at 6:19 PM, Richard Biener >>>>> <richard.guent...@gmail.com> wrote: >>> >>>>> Hi, >>>>> I split the patch into two and updated the test case. >>>>> The patches pass bootstrap/tests on x86/x86_64, also pass test on arm >>>>> cortex-m3. >>>>> Is it OK? >>>>> >>>>> Thanks, >>>>> bin >>>>> >>>>> PATCH 1: >>>>> >>>>> 2014-05-06 Bin Cheng <bin.ch...@arm.com> >>>>> >>>>> * gcc/tree-affine.c (tree_to_aff_combination): Handle MEM_REF for >>>>> core part of address expressions. >>>> >>>> No gcc/ in the changelog >>>> >>>> Simplify that by using aff_combination_add_cst: >>>> >>>> + if (TREE_CODE (core) == MEM_REF) >>>> + { >>>> + aff_combination_add_cst (comb, mem_ref_offset (core)); >>>> + core = TREE_OPERAND (core, 0); >>>> >>>> patch 1 is ok with that change. >>> >>> Installed with below change because of wide-int merge: >>> - core = build_fold_addr_expr (core); >>> + if (TREE_CODE (core) == MEM_REF) >>> + { >>> + aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, >>> 1))); >>> + core = TREE_OPERAND (core, 0); >>> >>>> >>>>> PATCH 2: >>>>> >>>>> 2014-05-06 Bin Cheng <bin.ch...@arm.com> >>>>> >>>>> * gcc/tree-ssa-loop-ivopts.c (contain_complex_addr_expr): New. >>>>> (alloc_iv): Lower base expressions containing ADDR_EXPR. >>>> >>>> So this "lowers" addresses(?) which are based on &<not-decl>, >>>> like &a[0] + 4 but not &a + 4. I question this odd choice. ISTR >>> &a+4 is already in its lowered form, what we want to handle is &EXPR + >>> 4, in which EXPR is MEM_REF/ARRAY_REF, etc.. >>> >>>> when originally introducing address lowering (in rev. 204497) I was >>>> concerned about the cost? >>> Yes, you did. I still think the iv number is relative small for each >>> loop, thus the change won't cause compilation time issue considering >>> the existing tree<->affine transformation in ivopt. >>> I would like to collect more accurate time information for ivopt in >>> gcc bootstrap. Should I use "-ftime-report" then add all time slices >>> in TV_TREE_LOOP_IVOPTS category? Is there any better solutions? >>> Thanks. >>> >>>> >>>> Now I wonder why we bother to convert the lowered form back >>>> to trees to store it in ->base and not simply keep (and always >>>> compute) the affine expansion only. Thus, change struct iv >>>> to have aff_tree base instead of tree base? >>>> >>>> Can you see what it takes to do such change? At the point >>>> we replace uses we go into affine representation again anyway. >>> Good idea, I may have a look. >> After going through work flow of ivopt, I think it's practical to make >> such change and can help compilation time. Ivopt calls >> tree_to_aff_combination to convert iv->base into affine_tree whenever >> it tries to represent an iv_use by an iv_cand, i.e., N*M times for >> computing cost of each iv_use represented by each iv_cand, and M times >> for rewriting each iv_use. Here, N and M stands for number of iv_use >> and iv_cand. Also strip_offset (which is a recursive function) is >> called for iv->base also at complexity of O(NM), so it may be improved >> too. >> To make a start, it's possible to store both tree and affine_tree of >> base in struct iv, and use either of them whenever needed. >> >> It seems to me there are two potential issues of this change. First, >> though affine_tree of base is stored, we can't operate on it directly, >> which means we have to copy it before using it. > > You'd use it as unchanged operand to some aff_* function to avoid > actual copying of course. > >> Second, ivopt >> sometimes computes affine_tree of base in different type other than >> TREE_TYPE(base), we may need to do additional calls to >> aff_combination_convert to get wanted type. All these will introduce >> additional computation and compromise benefit of the change. > > Sure. > >> At last, I did some experiments on how many additional calls to >> tree_to_aff_combination are introduced by this patch. The data of >> bootstrap shows it's less than 3% comparing to calls made by current >> implementation of ivopt, which confirms that this patch shouldn't have >> issue of compilation time. >> >> Any comments? > > I'd see the use of affines as making the algorithmic structure of > IVOPTs clearer and more consistent (also with possibly using > the _expand variants later to get even more simplifications). Yes, one example, it's possible to rewrite iv uses in a loop-invariant sensitive manner if we can use affine in the whole process. Currently fold_tree routines tend to decompose invariant into different expressions.
> > I don't want to force you to do this change but I thought it may > help further changes (as you seem to be doing a lot of IVOPTs > changes lately). > > So - the patch is ok as-is then, but please consider refactoring Patch installed. > IVOPTs code when you do changes. It's current structure > is somewhat awkward. Understood. I will continue to look at it when have proper time. Thanks, bin -- Best Regards.