On Tue, Jul 14, 2015 at 2:54 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, Jun 29, 2015 at 4:21 PM, Evgeniya Maenkova > <evgeniya.maenk...@gmail.com> wrote: >> On Mon, Jun 29, 2015 at 5:10 PM, Richard Biener >> <richard.guent...@gmail.com> wrote: >>> On Tue, Jun 9, 2015 at 10:11 PM, Evgeniya Maenkova >>> <evgeniya.maenk...@gmail.com> wrote: >>>> On Tue, Jun 9, 2015 at 3:46 PM, Richard Biener >>>> <richard.guent...@gmail.com> wrote: >>>>> On Fri, May 29, 2015 at 3:14 PM, Evgeniya Maenkova >>>>> <evgeniya.maenk...@gmail.com> wrote: >>>>>> Hi Richard, >>>>>> >>>>>> Here is some explanation. I hope you let me know if I need to clarify >>>>>> something. >>>>>> >>>>>> Also, you asked me about concrete example, to make sure you don’t miss >>>>>> my answer here is the link: >>>>>> https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02417.html. >>>>>> >>>>>> Also, I doubt whether it’s convenient for you to create a build with >>>>>> my patch or not. May be to clarify things you could send me some >>>>>> examples/concrete cases, then I’ll compile them with >>>>>> –fdump-tree-loopinit-details and –fdump-tree-lim-details and send you >>>>>> these dumps. May be these dumps will be useful. (I’ll only disable >>>>>> cleanup_cfg TODO after lim to let you know the exact picture after >>>>>> lim). >>>>>> >>>>>> What do you think? >>>>>> >>>>>> 1. invariantness _dom_walker – >>>>>> >>>>>> 1.1 for each GIMPLE_COND in given bb calls handle_cond_stmt to call >>>>>> for true and false edges handle_branch_edge, which calls SET_TARGET_OF >>>>>> for all bb ‘predicated’ by given GIMPLE_COND. >>>>>> >>>>>> SET_TARGET_OF sets in basic_blocks aux 2 facts: >>>>>> >>>>>> a) this is true or false edge; >>>>>> >>>>>> b) link to cond stmt; >>>>>> >>>>>> Handle_branch_edge works this way: >>>>>> >>>>>> If (cond1) >>>>>> >>>>>> { >>>>>> >>>>>> bb1; >>>>>> >>>>>> if (cond2} >>>>>> >>>>>> { >>>>>> >>>>>> bb2; >>>>>> >>>>>> } >>>>>> >>>>>> Being called for cond1, it sets cond1 as condition for both bb1 and >>>>>> bb2 (the whole branch for cond1, ie also for bb containing cond2), >>>>>> then this method will be called (as there is dominance order) for >>>>>> cond2 to correct things (ie to set cond2 as condition for bb2). >>>>> >>>>> Hmm, why not track the current condition as state during the DOM walk >>>>> and thus avoid processing more than one basic-block in handle_branch_edge? >>>>> Thus get rid of handle_branch_edge and instead do everything in >>>>> handle_cond_stmt >>>>> plus the dom-walkers BB visitor? >>>>> >>>> I need to look more carefully how to implement it, but I think I >>>> understand what you mean and this optimization of course looks >>>> reasonable to me. Will do. >>>> >>>>> I see you don't handle BBs with multiple predecessors - that's ok, but >>>>> are you sure you don't run into correctness issues when not marking such >>>>> BBs as predicated? This misses handling of, say >>>>> >>>>> if (a || b) >>>>> bb; >>>>> >>>>> which is a pity (but can be fixed later if desired). >>>>> >>>> I had some test (in gcc testsuite or bootstrap build) which worked >>>> incorrectly because of multiple predecessors. As far as I remember the >>>> situation was (further, will make some notes about such tests to >>>> clarify this better), I mean with previous version of my code which >>>> handled bb with 2 predecessors: >>>> if (a) >>>> tmpvar=something; >>>> while() >>>> if (a || b) >>>> basic_block {do something with tmpvar;} // I mean basic block >>>> predicated by bb with a and bb with b >>>> >>>> So, if a is false, I mean we didn't do tmpvar=something (outside >>>> loop), BUT we are in basick_block (we went by bb with b), we got >>>> Segmentation falt in basic_block {do something with tmpvar;}. >>>> >>>> I think we can reproduce all the details of this test if I remove not >>>> handling bb with 2 predecessors. >>>> >>>> So I wouldn't move bb with 2 predecessors (this is not always executed >>>> bb in any loop, not conditionally, they will not be moved at all). >>>> >>>> This is my more detail explanation on this point. Perhaps, I didn't >>>> understand your question about correctness. Could you repeat it in >>>> other words (based on this new clarification). >>>> >>>> So I think according to current code it will not be moved. What >>>> incorrectness do you mean? >>> >>> If the block isn't marked as predicated the question is whether it is >>> handled correctly or assumed to be unconditionally executed. >>> >>>>> I note that collecting predicates has similarities to what if-conversion >>>>> does in tree-ifcvt.c (even if its implementation is completely different, >>>>> of course). >>>>> >>>> >>>> Ok, I'll look at this. But could you please clarify your point? >>>> (Should I just take this into account with low priority and look at >>>> this later or you want some refactoring?) >>> >>> I just noted similar code exists elsewhere - it may be possible to >>> factor it out but I didn't investigate. And no, doing that isn't a >>> prerequesite >>> for this patch. >>> >>>>>> 1.2 As 1.1 goes we identify whether some bb is predicated by some >>>>>> condition or not. >>>>>> >>>>>> bb->aux->type will be [TRUE/FALSE]_TARGET_OF and >>>>>> bb->aux->cond_stmt=cond stmt (the nearest condition). >>>>>> >>>>>> If bb is always executed bb->aux->type = ALWAYS_EXECUTED_IN, >>>>>> bb->loop->loop (this info was available in the clean build). >>>>>> >>>>>> 1.3 As this walker is called in dominance order, information about >>>>>> condition is available when invariantness_dom_walker is called for >>>>>> given bb. So we can make some computations based on bb->aux >>>>>> structure. This is done in check_conditionally_executed. The main goal >>>>>> of this method is to check that the root condition is always executed >>>>>> in the loop. I did so to avoid situation like this >>>>>> >>>>>> Loop: >>>>>> >>>>>> Jmp somewhere; >>>>>> >>>>>> If (cond1) >>>>>> >>>>>> If (cond2) >>>>>> >>>>>> Etc >>>>>> >>>>>> By ‘root condition’ I mean cond1 in this case (the first cond in >>>>>> dominance order). >>>>> >>>>> Ok, but this can't happen (an uncontional jump to somehwere). It >>>>> will always be a conditional jump (a loop exit) and thus should >>>>> be handled already. >>>>> >>>>> Did you have a testcase that was mishandled? >>>> >>>> No, I have not such testcase. This is because I;m newbie at gcc and >>>> compilers. If you tell me this fact, let's just remove this check? >>>> Correct? >>> >>> Yes. >>> >>>>> >>>>>> If ‘root condition’ is not always executed we disable (free) all the >>>>>> info in bb->aux, ie all the blocks becomes neither conditionally nor >>>>>> always executed according to bb->aux. >>>>>> >>>>>> There is some optimization to don’t go each time trough the conditions >>>>>> chain (cond2->cond1), let me skip such details for now. >>>>>> >>>>>> >>>>>> >>>>>> 1.4 Then we calculate tgt_level (which is used in move_computations >>>>>> later) >>>>>> >>>>>> The patch is the same for phi and regular stmt (calculate_tgt_level) >>>>>> >>>>>> 1) If stmt is conditionally executed we compute max movement >>>>>> (determine_max_movement) starting from >>>>>> get_lim_data(condition)->max_loop. >>>>>> >>>>>> 2) If stmt is not cond executed as start level for >>>>>> determine_max_movement computations we choose ALWAYS_EXECUTED_IN. >>>>>> >>>>>> To clarify why, see the comment >>>>>> >>>>>> /* The reason why we don't set start_level for MOVE_POSSIBLE >>>>>> >>>>>> statements to more outer level is: this statement could depend >>>>>> on >>>>>> >>>>>> conditional statements and even probably is not defined >>>>>> without this >>>>>> >>>>>> condition. So either we should analyze these ones and move >>>>>> >>>>>> under condition somehow or keep more strong start_level . */ >>>>>> >>>>>> As you noted in your review there was some refactoring. Of course, I >>>>>> had no goal to refactor existing code, I intended to remove some >>>>>> duplication which I caused by my changes. I hope we discuss in >>>>>> details later if you don’t like some refactoring. >>>>> >>>>> Refactoring makes a big patch harder to review because it ends up >>>>> containing no-op changes. It also makes it appear bigger than it >>>>> really is ;) >>>>> >>>>> A reasonable solution is to factor out the refactoring to a separate >>>>> patch. >>>>> >>>> I see what you mean. >>>> >>>>>> 2. store_motion - for some stmts set flags to ignore predicated >>>>>> execution (I mean to move statements without conditions). >>>>> >>>>> That's for the existing "predicated" support as far as I can see. >>>>> >>>>>> >>>>>> >>>>>> 3. move_computations. The patch is doing something in the >>>>>> following 3 calls inside of this method. >>>>>> >>>>>> >>>>>> >>>>>> 3.1 (more details below) walker.walk – move computations(create if() >>>>>> structure) and store information to create phi nodes later (which >>>>>> statements require phi nodes, as they conditionally executed and there >>>>>> are still their uses on other levels, what are the names for such phi >>>>>> nodes, as we replace uses in here, in walker.walk.) >>>>> >>>>> What kind of uses are that? By definition all stmts executed >>>>> under condition A that are used in code not under condition A have >>>>> a PHI node associated. This is why the existing conditional movement >>>>> handling moves PHI nodes (and all dependent computations). >>>>> >>>>> Are you thinking of >>>>> >>>>> for (;;) >>>>> if (a) x = 1; >>>>> >>>>> .. use of x >>>>> >>>>> ? >>>> No, I think in this case phi node of x will be used in initial >>>> (==loopinit) code (in 'use of x') instead of some version of x (which >>>> I move). >>>> >>>>>If you move the PHI node for x then you don't need to insert any other >>>>> PHI nodes (and you _do_ have to move the PHI node anyway). >>>> >>>>> >>>>>> 3.2 (more details below) create_phi_nodes – create phi nodes for >>>>>> statements which require that (see 3.1) >>>>> >>>>> Only PHI nodes need PHI node creation and only its dependences >>>>> might need predication. >>>>> >>>>> That is, if you move a conditionally executed statement but not >>>>> any PHI node it feeds then the movement is useless. >>>>> >>>>> But maybe I am missing something about the PHI business. >>>>> >>>> Ok. I have different understanding on this (and am afraid of this of >>>> course :) ). >>>> >>>> See the example: >>>> void bar (long); >>>> void foo (int cond1, int cond2, int cond3, int m, int n) >>>> { >>>> unsigned x; >>>> unsigned inv1; >>>> >>>> for (x = 0; x < 1000; ++x) >>>> { >>>> if (cond1) >>>> { >>>> int tmp_inv1 = m*n; >>>> non_inv1 = tmp_inv1 + x; >>>> } >>>> } >>>> bar (non_inv1); >>>> } >>>> >>>> This is working example, will attach loopinit and lim1 dumps. >>>> There is no phi node for tmp_inv1 in loopinitat all. However, I think >>>> we should move tmp_inv1. To use tmp_inv1 in non_inv1 we need phi node >>>> (otherwise we get verification failure as bb with conditionally moved >>>> tmp_inv1 doesn;t dominate bb with non_inv1). Another example if >>>> non_inv1 is invariant but has not enough cost or something else. In >>>> short, if it's still used in initial or others loops. Say, non_inv1 is >>>> invariant and moved in another loop. Or something else. >>>> >>>> So, when I read >>>> 'Only PHI nodes need PHI node creation' >>>> and >>>> 'That is, if you move a conditionally executed statement but not any >>>> PHI node it feeds then the movement is useless.' >>>> I think about this example and don't understand. Could you clarify? >>> >>> Ok, so this case is already handled by LIM by simply moving tmp_inv1 = m*n >>> (and making it unconditionally execute before the loop). In fact whether >>> we can move it does not depend on whether 'cond' is invariant or not. >>> >>> If we couldn't execute m*n unconditionally on the loop preheader edge >>> then we'd have to be able to also move the guarding condition. Note >>> that cost considerations shouldn't come into play here. >>> >>> So I was only thinking of the case where we also want to move the >>> condition (which is wanted if we want to move a PHI node). >>> >> >> Don't understand "So I was only thinking of the case where we also >> want to move the condition (which is wanted if we want to move a PHI >> node)." >> >> What about the case when we want to move condition but don't have phi >> node to move. >> >> Let's consider a little bit modified example: >> >> void bar (long); >> void foo (int cond1, int cond2, int cond3, int m, int n) >> { >> unsigned x; >> unsigned inv1; >> >> for (x = 0; x < 1000; ++x) >> { >> if (n) >> { >> int tmp_inv1 = m/n; >> non_inv1 = tmp_inv1 + x; >> } >> } >> bar (non_inv1); >> } >> We couldn't move tmp_inv1 uncoditionally. Right (I have no clean >> build at hands right now to check. If I'm wrong let's replace m/n to >> something that couldn't be unconditionally moved :))? There is no phi >> node for tmp_inv1. >> Could you clarify what conditional lim should do in your opinion here? > > Nothing at this point. If only then for the simple reason to make the > first iteration of the patch simpler (still covering most cases). > > You seem to want to catch 100% of all possible cases. > >> Is it too little gain to move statements that only used in the same bb >> or branch (I mean before phi node creation, even if stmt had phi >> node)? >> >> In fact, we could have here 2 thousands of divisions: >> if (n) >> { >> int tmp_inv1 = m/n; >> tmp_inv2 = something_which_couldn't_be moved_unconditionally (tmp_inv1); >> ... >> tmp_invN = something_which_couldn't_be moved_unconditionally >> (tmp_invN-1); >> non_inv1 = tmp_invN + x; >> } > > Of course we could. But the patch didn't even exercise this case (no > testcase).
Patch handles this case (this is why I created phi nodes which we tried to discuss). Your comment is about test case not about the patch. > > Please make patches simple - support for moving non-PHIs can be added > as a followup. The first important part is to make the current code > (moving PHIs) > create control-flow rather than if-converted form. > > Richard. > >> >>>>>> >>>>>> 3.3 replace_uses_in_conditions >>>>>> >>>>>> After computations movements we can have several copies of the cond >>>>>> stmt. In 3.1 we do replacement of uses based on stmt’s tgt_level. For, >>>>>> cond stmt it doesn’t work. As cond stmt, of course, have something in >>>>>> lim_aux_data->tgt_level, but, in fact, they are moved only if >>>>>> corresponding invariants are moved. So, in fact, the same cond (copies >>>>>> of it, of course) could go to the different levels. So to fix these >>>>>> uses, we call replace_uses_in_conditions. >>>>> >>>>> Hmm, but stmts that a condition depends on always have at least the >>>>> target level of the condition statement. Thus their computation is >>>>> available in all other "same" conditon statements that might appear >>>>> in more inner loop levels, no? So isn't this just about performing the >>>>> movement in a proper ordering? >>>> >>>> Let me give your more details, then may be you repeat question in other >>>> words. >>>> for () >>>> { >>>> if (a) >>>> { >>>> inv1 = some_computations; >>>> if (inv1 < inv2) >>>> { >>>> inv3; >>>> inv4; >>>> } >>>> } >>>> } >>>> >>>> So, the order for movement is: inv1; inv3; inv4. When we move inv1 we >>>> have tgt_level for inv3 and inv4 computed but we have not created >>>> copies of 'if (inv1 < inv2)' created. In theory, we can have several >>>> copies:(1) for tgt_level of inv3 (2) for tgt_level of inv2 (3) in >>>> initial cycle. if (1) is in tgt level of inv1 we need no replacement, >>>> in other cases we need replacements. Of course, we know this(all the >>>> info about tgt_levels) when we move inv1, but then we need to create >>>> all needed copies (store it somehow, etc) of 'if (inv1 < inv2') and >>>> make replacement in these copies. I don't see now why this is much >>>> better. This doesn't mean that current solution is perfect, just >>>> clarify your thought. >>> >>> I still don't get what "replacements" actually mean. Let's extend the >>> example to >>> >>> for () (A) >>> { >>>> for () (B) >>>> { >>>> if (a) >>>> { >>>> inv1 = some_computations; >>>> if (inv1 < inv2) >>>> { >>>> inv3; >>>> inv4; >>>> } >>>> } >>>> } >>> } >>> >>> then you say the target level of inv1 could be 'A' and the target >>> level of inv3 'B' >>> and that of inv4 is 'B' as well. >>> >>> That's true. So we move inv1 to before 'A' and inv3/inv4 to before 'B' >>> guarded >>> with if (inv1 < inv2). I fail to see what "copies" we need here. 'inv1' is >>> available in all target levels below its own, no copies needed. >>> >> >> I meant the copies of conditions (COND_STMT: inv1 < inv2). >> >> Let's modify this example: tgt_level (inv1) = A; tgt_level(inv3) = A; >> tgt_level(inv4) = B. >> >> Then >> if (a) >> inv1=some_computations; >> if (inv1 < inv2) >> inv3; >> <preheader edge for A, where phi for inv1(phi_for_inv1) is computed> >> for () (A) >> { >> if (phi_for_inv1 <inv2) >> inv3 >> for () (B) >> { >> if (a) >> { >> if (phi_for_inv1 < inv2) >> { >> } >> } >> } >> So, we have 3 copies of inv1 < inv2 after lim (in some of them we use >> inv1, in other phi_for_inv1). But this is the second priority for now >> (this is all about whether we should move statements w/o phi node). >> >>>>> >>>>>> More details on 3.1 and 3.2 >>>>>> >>>>>> 3.1 The patch is very similar for phi stmts and regular stmts (there >>>>>> is some difference for regular stmts related to the case when we move >>>>>> sequence instead of single stmt, let’s skip it in this description, >>>>>> will describe later. BTW, there are comments, in corresponding parts). >>>>>> >>>>>> a) for each bb we call check_conditionally_executed and if bb is cond >>>>>> executed then invariant statement will be moved conditionally. >>>>>> >>>>>> b) prepare_edge_for_stmt collects information needed to create phi >>>>>> nodes in 3.2(add_info_for_phi) AND prepares edge ie creates if() >>>>>> structure needed to move statement >>>>>> (get_block_for_predicated_statement, see below). All of this is done >>>>>> for cond executed stmt. Nothing is done for non cond executed. >>>>>> >>>>>> BTW, there is some strange names for functions like missed ending >>>>>> (prepare_edge_for_stm, w/o t in the end, this is because of my script >>>>>> to add ‘ ‘ before ‘(‘, will fix in the next patch version, ignore >>>>>> please so far). >>>>>> >>>>>> Get_block_for_predicated_stmt: >>>>>> >>>>>> Create bb where to add cond stmt to (or return existing, each loop has >>>>>> a map (level_cond_data_map is map <loop, cond_data_map> , >>>>>> cond_data_map is map <gimple, basic_block>) to identify if bb for >>>>>> given cond was created or not for given level). >>>>>> >>>>>> Parameters of this method: >>>>>> >>>>>> Cond – condition of stmt, branch – is it on true or false edge of >>>>>> cond, cond_map - see above, level (tgt_level), preheader – >>>>>> preheader_edge for level. >>>>>> >>>>>> So first of all we check whether there is a basic block for given >>>>>> condition in given loop. >>>>>> >>>>>> a) If such bb exists then we trying to find corresponding edge, >>>>>> destination of this edge shouldn’t be a post dominator of bb >>>>>> (find_branch_edge). >>>>>> >>>>>> a.1) if such edge exists we return the most remote in this branch bb >>>>>> (or create it), see get_most_remote_in_branch; >>>>>> >>>>>> a.2) if such edge doesn’t exists, we create corresponding bb, see >>>>>> make_branch >>>>>> >>>>>> b) If such bb doesn’t exist we create it (recursively calling >>>>>> get_block_for_predicated_stmt if required bb is conditionally >>>>>> executed). >>>>>> >>>>>> Will not describe it further so far (let me know if you would like to >>>>>> read additional details). >>>>>> >>>>>> Add_info_for_phi >>>>>> >>>>>> The main goal of this method is to identify situation where >>>>>> stmt(parameter,ie the statement which we are moving) or more exactly >>>>>> lhs of this statement >>>>>> >>>>>> a) is still used in other loops, so we need to create phi node to >>>>>> use phi name inside other loops. >>>>>> >>>>>> b) Is used in phi node of original loop and this phi node is >>>>>> going to be moved. BUT, as we move phi node as assignment (in the case >>>>>> of phi node with 2 arguments) we need to create phi node to use lhs of >>>>>> statement (param of add_info_for_phi). >>>>>> >>>>>> In these cases we make corresponding replacements and store >>>>>> information needed to create phi nodes and make some other >>>>>> replacements in 3.3 (replace_uses_in_conditions). >>>>>> >>>>>> Add_info_for_phi calls create_tmp_var which requires some explanation. >>>>>> >>>>>> Create_tmp_var >>>>>> >>>>>> To create new names for phi nodes and to create default def for phi >>>>>> arguments I use make_ssa_name and get_or_create_ssa_default_def. These >>>>>> methods have some asserts (being merged): (TREE_CODE (old_var) == >>>>>> VAR_DECL || TREE_CODE (old_var) == PARM_DECL || TREE_CODE >>>>>> (old_var) == RESULT_DECL). >>>>>> >>>>>> So in some cases (when I know that I need to use methods mentioned >>>>>> above, see the start of create_tmp_var, ie the uses in other loops) I >>>>>> create new tmp variable to overcome these asserts. To be honest I >>>>>> don’t like this but don’t know how to deal with this better. >>>>> >>>>> Hmm, don't you just run into SSA names here? That is, TREE_CODE >>>>> (old_var) == SSA_NAME? >>>>> You can use make_ssa_name (TREE_TYPE (old_var), "plim") in this case. >>>>> >>>>> Or you run into the issue that anonymous SSA names cannot have default >>>>> defintions (I don't understand why you need all the fuss about default >>>>> defs). >>>> >>>> I used default defs to write something in phi args when I have no values on >>>> corresponding branches. Say there is some tmp value which I should >>>> create phi for. >>>> if (a) >>>> { >>>> tmpvar=something >>>> b = something(tmpvar); >>>> } >>>> else >>>> { >>>> } >>>> When I move if (a) {tmpvar = something} and create phi for tmpvar (to >>>> use phi result >>>> in the initial cycle) i need to write something for 'not a' phi arg. >>>> BUT, in some cases I use >>>> build_zero_cst (which I started to use after default defs use. I mean >>>> I didn't know about >>>> build_zero_cst while I started to use default defs.). So may be I can >>>> replace >>>> uses of default defs in such cases by build_zero_cst? I mean just SOME >>>> values >>>> (as I know that control flow will never go to this places actually). I >>>> mean if !a tmpvar will not used. >>> >>> Ok, this is again about the case where you are not moving a PHI node but >>> creating it because you want to conditionally execute invariant stmts. >>> Yes, a default def is ok here, but you can always use a new VAR_DECL >>> just for the default def creation like gimple_fold_call does for example: >>> >>> tree var = create_tmp_var (TREE_TYPE (lhs)); >>> tree def = get_or_create_ssa_default_def (cfun, >>> var); >>> >>> Note that I think we don't need to bother about this case as we'd only >>> want to move PHI nodes in the first place (so you already have values >>> for all PHI args). >>> >>>>> >>>>>> And a couple of words regarding where we store info for phi nodes: >>>>>> >>>>>> Each loop contains in its aux phi_data structure. On this stage we >>>>>> only add there stmt (if phi node will be required), phi name=phi node >>>>>> result name in names_map or fl_names_map). See the comment about >>>>>> phi_data in patch. >>>>>> >>>>>> If there should be created several phi nodes for given lhs (I mean the >>>>>> first place for phi node doesn’t dominate on corresponding preheader, >>>>>> we add only the last name in names_map (as intermediate names could be >>>>>> created later, but the last name in bb which dominates preheader >>>>>> should be used for replacement in other loops. Replacements were >>>>>> already made in walker). >>>>>> >>>>>> If lhs is used only in phi node, which are moved and transformed into >>>>>> assignment, and there is no uses in other loops, we need to create >>>>>> only first level phi node (don’t need to create phi nodes till some bb >>>>>> which dominates preheader), then we add such name to fl_names_map. >>>>>> >>>>>> 3.2 Create_phi_nodes >>>>>> >>>>>> For each of the loops we go over ((phi_data*) loop->aux)->stmts. These >>>>>> are statements which were moved, but they have uses in the original >>>>>> loop (more exactly, their uses in some other loops replaced by some >>>>>> name from ((phi_data*) loop->aux)->names_map or ((phi_data*) >>>>>> loop->aux)->fl_names_map. >>>>>> >>>>>> So for each of such stmts we create phi nodes (for its lhs) chain. >>>>>> >>>>>> Basic block for phi node is found in get_bb_for_phi for given bb with >>>>>> stmt. If basic block for phi node dominates preheader->src we end >>>>>> chain creation. There is some special condition for the case when we >>>>>> need to create only first level phi node. (I described this situation >>>>>> above, but let me know If I need to add more details. >>>>>> >>>>>> Basic blocks can have maps to identify if we created phi node for >>>>>> given variable in given basic_block, so we only add an edge argument >>>>>> for such case (phi_map = new hash_map<basic_block, hash_map<tree, >>>>>> gphi*>*>). >>>>> >>>>> Ok. >>>>> >>>>> Well - I think it might be easier code-generation-wise to have a separate >>>>> phase move_phis executed before move_computations, that just moves >>>>> invariant PHI nodes (which this all is about - see above) and their >>>>> controlling conditions and only then move their dependencies. >>>> I didn;t understand so far (I think let's clarify previous >>>> questions,especially >>>> about phi nodes then let's go back to this one.) >>> >>> Yeah, I think the PHI node thing is our major mis-understanding (or the >>> main thing that I think complicates the patch for too little gain - at least >>> initially). >>> >>> Thanks, >>> Richard. >>> >>>>> >>>>> Thanks, >>>>> Richard. >>>>> >>>>>> In short, that’s all. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Evgeniya >>>>>> >>>>>> >>>>>> >>>>>> On Sat, May 9, 2015 at 1:07 AM, Evgeniya Maenkova >>>>>> <evgeniya.maenk...@gmail.com> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Could you please review my patch for predicated lim? >>>>>>> >>>>>>> Let me note some details about it: >>>>>>> >>>>>>> >>>>>>> >>>>>>> 1) Phi statements are still moved only if they have 1 or 2 >>>>>>> arguments. However, phi statements could be move under conditions (as >>>>>>> it’s done for the other statements). Probably, phi statement motion >>>>>>> with 3 + arguments could be implemented in the next patch after >>>>>>> predicated lim. >>>>>>> >>>>>>> 2) Patch has limitations/features like (it was ok to me to >>>>>>> implement it such way, maybe I’m not correct. ): >>>>>>> >>>>>>> a) Loop1 >>>>>>> >>>>>>> { >>>>>>> >>>>>>> If (a) >>>>>>> >>>>>>> Loop2 >>>>>>> >>>>>>> { >>>>>>> >>>>>>> Stmt - Invariant for Loop1 >>>>>>> >>>>>>> } >>>>>>> >>>>>>> } >>>>>>> >>>>>>> In this case Stmt will be moved only out of Loop2, because of if >>>>>>> (a). >>>>>>> >>>>>>> b) Or >>>>>>> >>>>>>> Loop1 >>>>>>> >>>>>>> { >>>>>>> >>>>>>> … >>>>>>> >>>>>>> If (cond1) >>>>>>> >>>>>>> If (cond2) >>>>>>> >>>>>>> If (cond3) >>>>>>> >>>>>>> Stmt; >>>>>>> >>>>>>> } >>>>>>> >>>>>>> Stmt will be moved out only if cond1 is always executed in Loop1. >>>>>>> >>>>>>> c) It took me a long time to write all of these code, so there >>>>>>> might be other peculiarities which I forgot to mention. :) >>>>>>> >>>>>>> Let’s discuss these ones as you will review my >>>>>>> patch. >>>>>>> >>>>>>> 3) Patch consists of 9 files: >>>>>>> >>>>>>> a) gcc/testsuite/gcc.dg/tree-ssa/loop-7.c, >>>>>>> gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – changed tests: >>>>>>> >>>>>>> - gcc/testsuite/gcc.dg/tree-ssa/loop-7.c changed as >>>>>>> predicated lim moves 2 more statements out of the loop; >>>>>>> >>>>>>> - gcc/testsuite/gcc.dg/tree-ssa/recip-3.c – with conditional >>>>>>> lim recip optimization in this test doesn’t work (the corresponding >>>>>>> value is below threshold as I could see in the code for recip, 1<3). >>>>>>> So to have recip working in this test I changed test a little bit. >>>>>>> >>>>>>> b) gcc/tree-ssa-loop-im.c – the patched lim per se >>>>>>> >>>>>>> c) gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-13.c, >>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-14.c, >>>>>>> >>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-15.c, >>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c, >>>>>>> >>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-17.c, >>>>>>> gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-18.c >>>>>>> >>>>>>> the tests for conditional lim. >>>>>>> >>>>>>> 4) Patch testing: >>>>>>> >>>>>>> a) make –k check (no difference in results for me for the clean >>>>>>> build and the patched one, >>>>>>> >>>>>>> - Revision: 222849, >>>>>>> >>>>>>> - uname -a >>>>>>> >>>>>>> Linux Istanbul 3.16.0-23-generic #31-Ubuntu SMP Tue Oct >>>>>>> 21 18:00:35 UTC 2014 i686 i686 i686 GNU/Linux >>>>>>> >>>>>>> b) Bootstrap. >>>>>>> >>>>>>> It goes well now, however to fix it I have made a temporary hack in >>>>>>> the lim code. And with this fix patch definitely shouldn’t be >>>>>>> committed. >>>>>>> >>>>>>> I did so, as I would like to discuss this issue first. >>>>>>> >>>>>>> The problem is: I got stage2-stage3 comparison failure on the single >>>>>>> file (tree-vect-data-refs.o). After some investigation I understood >>>>>>> that tree-vect-data-refs.o differs being compiled with and without >>>>>>> ‘-g’ option (yes, more exactly on stage 2 this is ‘-g –O2 –gtoggle’, >>>>>>> and for stage 3 this is ‘-g –O2’. But to simplify things I can >>>>>>> reproduce this difference on the same build (even not bootstrapped), >>>>>>> with option ‘ –g’ and without it). >>>>>>> >>>>>>> Of course, the file compiled with –g option will contain debug >>>>>>> information and will differ from the corresponding file without debug >>>>>>> information. I mean there is the difference reported by >>>>>>> contrib/compare-debug (I mean we talk about stripped files). >>>>>>> >>>>>>> Then I compared assemblers and lim dump files and made a conclusion >>>>>>> the difference is: >>>>>>> >>>>>>> There is statement _484=something, which is conditionally moved out of >>>>>>> loop X. In non debug cases no usages of _484 are left inside the loop >>>>>>> X. In debug case, there is the single use in debug statement. So for >>>>>>> debug case my patch generates phi statement for _484 to make it >>>>>>> available inside the loop X. For non debug case, no such phi statement >>>>>>> generated as there is no uses of _484. >>>>>>> >>>>>>> There is one more statement with lhs=_493, with exactly this pattern >>>>>>> of use. But this is not important for our discussion. >>>>>>> >>>>>>> >>>>>>> >>>>>>> So, in my opinion it’s ok to generate additional phi node for debug >>>>>>> case. But I’m not a compiler expert and maybe there is some >>>>>>> requirement that debug and non-debug versions should differ only by >>>>>>> debug statements, I don’t know. >>>>>>> >>>>>>> >>>>>>> >>>>>>> My temporary hack fix is just removing of such debug statements (no >>>>>>> debug statements, no problems J). >>>>>>> >>>>>>> >>>>>>> >>>>>>> The alternatives I see are: >>>>>>> >>>>>>> - Move debug statements outside the loop (not good in my >>>>>>> opinion); >>>>>>> >>>>>>> - Somehow reset values in debug statements (not good in my >>>>>>> opinion, if I could provide correct information for them); >>>>>>> >>>>>>> - Generate phi for debug statements and fix bootstrap (say, >>>>>>> let’s have some list of files, which we have special check for. I mean >>>>>>> for my case, the difference is not in stage2 and stage3, it’s in debug >>>>>>> and non-debug). I like this variant. However, as I don’t see such list >>>>>>> of special files in the whole gcc, I think I might be missing >>>>>>> something. >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> >>>>>>> >>>>>>> Evgeniya >> >> >> >> -- >> Thanks, >> >> Evgeniya >> >> perfstories.wordpress.com -- Thanks, Evgeniya perfstories.wordpress.com