On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > I am sending you full patch (~1000 lines) but if you need only patch.1 > and patch.2 will let me know and i'll send you reduced patch. > > Below are few comments regarding your remarks for patch.3. > > 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case > when dead code elimination is required to vectorize loop, i.e. dead > statement is marked as relevant. > 2. You wrote: >> The "retry" code also looks odd - why do you walk the BB multiple >> times instead of just doing sth like >> >> while (!has_single_use (lhs)) >> { >> gimple copy = ifcvt_split_def_stmt (def_stmt); >> ifcvt_walk_pattern_tree (copy); >> } >> >> thus returning the copy you create and re-process it (the copy should >> now have a single-use). > > The problem is that not only top SSA_NAME (lhs) may have multiple uses > but some intermediate variables too. For example, for the following > test-case > > float a[1000]; > int c[1000]; > > int foo() > { > int i, res = 0; > #pragma omp simd safelen(8) > for (i=0; i<512; i++) > { > float t = a[i]; > if (t > 0.0f & t < 1.0e+17f) > if (c[i] != 0) > res += 1; > } > return res; > } > > After combine_blocks we have the following bb: > > <bb 3>: > # res_15 = PHI <res_1(7), 0(15)> > # i_16 = PHI <i_11(7), 0(15)> > # ivtmp_14 = PHI <ivtmp_13(7), 512(15)> > t_5 = a[i_16]; > _6 = t_5 > 0.0; > _7 = t_5 < 9.9999998430674944e+16; > _8 = _6 & _7; > _10 = &c[i_16]; > _ifc__32 = _8 ? 4294967295 : 0; > _9 = MASK_LOAD (_10, 0B, _ifc__32); > _28 = _8; > _29 = _9 != 0; > _30 = _28 & _29; > _ifc__31 = _30 ? 1 : 0; > res_1 = res_15 + _ifc__31; > i_11 = i_16 + 1; > ivtmp_13 = ivtmp_14 - 1; > if (ivtmp_13 != 0) > goto <bb 7>; > else > goto <bb 8>; > > and we can see that _8 has multiple uses. Also note that after splitting of > _8 = _6 & _7 > we also get multiple uses for definition of _6 and _7. So I used this > iterative algorithm as the simplest one.
But it walks the entire pattern again and again while you only need to ensure you walk the pattern tree of the now single-use DEF again (in fact, rather than replacing a random USE in ifcvt_split_def_stmt you should pass down the user_operand_p that you need to make single-use). > I think it would be nice to re-use some utility from tree-vect-patterns.c > for stmt_is_root_of_bool_pattern. > > I assume that function stmt_is_root_of_bool_pattern can be simplified > to check on COND_EXPR only since PHI predication and memory access > predication produced only such statements,i.e. it can look like > > static bool > stmt_is_root_of_bool_pattern (gimple stmt, tree *var) > { > enum tree_code code; > tree lhs, rhs; > > code = gimple_assign_rhs_code (stmt); > if (code == COND_EXPR) > { > rhs = gimple_assign_rhs1 (stmt); > if (TREE_CODE (rhs) != SSA_NAME) > return false; > *var = rhs; > return true; > } > return false; > } > > I also did few minor changes in patch.2. > > 3. You can also notice that I inserted code in tree_if_conversion to > do loop version if explicit option "-ftree-loop-if-convert" was not > passed to compiler, i.e. we perform if-conversion for loop > vectorization only and if it does not take place, we should delete > if-converted version of loop. > What is your opinion? Overall part 1 and part 2 look good to me, predicate_scalar_phi looks in need of some refactoring to avoid duplicate code. We can do that a followup. Part 3 still needs the iteration to be resolved and make the use we actually care about single-use, not a random one so we can avoid iterating completely. Richard. > Thanks. > Yuri. > > 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Hi Richard, >>> >>> Here is updated patch which includes >>> (1) split critical edges for aggressive if conversion. >>> (2) delete all stuff related to support of critical edge predication. >>> (3) only one function - predicate_scalar_phi performs predication. >>> (4) function find_phi_replacement_condition was deleted since it was >>> included in predicate_scalar_phi for phi with two arguments. >>> >>> I checked that patch works in stress testing mode, i.e. with >>> aggressive if conversion by default. >>> >>> What is your opinion? >> >> Looks ok overall, but please simply do >> >> FOR_EACH_EDGE (e, ei, bb->succs) >> if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop) >> split_edge (e); >> >> for all blocks apart from the latch. >> >> Can you please send a combined patch up to this one? Looking at >> the incremental diff is somewhat hard. Thus a patch including all >> patches from patch1 to this one. >> >> Thanks, >> Richard. >> >>> >>> Thanks. >>> Yuri. >>> >>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Richard, >>>>> >>>>> Thanks for your reply! >>>>> >>>>> I didn't understand your point: >>>>> >>>>> Well, I don't mind splitting all critical edges unconditionally >>>>> >>>>> but you do it unconditionally in proposed patch. >>>> >>>> I don't mind means I am fine with it. >>>> >>>>> Also I assume that >>>>> call of split_critical_edges() can break ssa. For example, we can >>>>> split headers of loops, loop exit blocks etc. >>>> >>>> How does that "break SSA"? You mean loop-closed SSA? I'd >>>> be surprised if so but that may be possible. >>>> >>>>> I prefer to do something >>>>> more loop-specialized, e.g. call edge_split() for critical edges >>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge >>>>> destination bb belongs to loop). >>>> >>>> That works for me as well but it is more complicated to implement. >>>> Ideally you'd only split one edge if you find a block with only critical >>>> predecessors (where we'd currently give up). But note that this >>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it >>>> will change loop->num_nodes so we have to be more careful in >>>> constructing the loop calling if_convertible_bb_p. >>>> >>>> Richard. >>>> >>>>> >>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> Sorry that I forgot to delete debug dump from my fix. >>>>>>> I have few questions about your comments. >>>>>>> >>>>>>> 1. You wrote : >>>>>>>> You also still have two functions for PHI predication. And the >>>>>>>> new extended variant doesn't commonize the 2-args and general >>>>>>>> path >>>>>>> Did you mean that I must combine predicate_scalar_phi and >>>>>>> predicate_extended scalar phi to one function? >>>>>>> Please note that if additional flag was not set up (i.e. >>>>>>> aggressive_if_conv is false) extended predication is required more >>>>>>> compile time since it builds hash_map. >>>>>> >>>>>> It's compile-time complexity is reasonable enough even for >>>>>> non-aggressive if-conversion. >>>>>> >>>>>>> 2. About critical edge splitting. >>>>>>> >>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv >>>>>>> option only; (2) should we split all critical edges. >>>>>>> Note that this leads to recomputing of topological order. >>>>>> >>>>>> Well, I don't mind splitting all critical edges unconditionally, thus >>>>>> do something like >>>>>> >>>>>> Index: gcc/tree-if-conv.c >>>>>> =================================================================== >>>>>> --- gcc/tree-if-conv.c (revision 218515) >>>>>> +++ gcc/tree-if-conv.c (working copy) >>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f >>>>>> if (number_of_loops (fun) <= 1) >>>>>> return 0; >>>>>> >>>>>> + bool critical_edges_split_p = false; >>>>>> FOR_EACH_LOOP (loop, 0) >>>>>> if (flag_tree_loop_if_convert == 1 >>>>>> || flag_tree_loop_if_convert_stores == 1 >>>>>> || ((flag_tree_loop_vectorize || loop->force_vectorize) >>>>>> && !loop->dont_vectorize)) >>>>>> - todo |= tree_if_conversion (loop); >>>>>> + { >>>>>> + if (!critical_edges_split_p) >>>>>> + { >>>>>> + split_critical_edges (); >>>>>> + critical_edges_split_p = true; >>>>>> + todo |= TODO_cleanup_cfg; >>>>>> + } >>>>>> + todo |= tree_if_conversion (loop); >>>>>> + } >>>>>> >>>>>> #ifdef ENABLE_CHECKING >>>>>> { >>>>>> >>>>>>> It is worth noting that in current implementation bb's with 2 >>>>>>> predecessors and both are on critical edges are accepted without >>>>>>> additional option. >>>>>> >>>>>> Yes, I know. >>>>>> >>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more >>>>>> to it and even fix the critical edge missed optimization with splitting >>>>>> critical edges then I am all for that solution. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> Thanks ahead. >>>>>>> Yuri. >>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Richard, >>>>>>>>> >>>>>>>>> Here is updated patch2 with the following changes: >>>>>>>>> 1. Delete functions phi_has_two_different_args and >>>>>>>>> find_insertion_point. >>>>>>>>> 2. Use only one function for extended predication - >>>>>>>>> predicate_extended_scalar_phi. >>>>>>>>> 3. Save gsi before insertion of predicate computations for basic >>>>>>>>> blocks if it has 2 predecessors and >>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors >>>>>>>>> and at least one incoming edge >>>>>>>>> is critical. This saved iterator can be used by extended phi >>>>>>>>> predication. >>>>>>>>> >>>>>>>>> Here is motivated test-case which explains this point. >>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2 >>>>>>>>> -ftree-loop-vectorize -fopenmp options. >>>>>>>>> The problem phi is in bb-7: >>>>>>>>> >>>>>>>>> bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) >>>>>>>>> { >>>>>>>>> <bb 5>: >>>>>>>>> xmax_edge_18 = xmax_edge_36 + 1; >>>>>>>>> if (xmax_17 == xmax_27) >>>>>>>>> goto <bb 7>; >>>>>>>>> else >>>>>>>>> goto <bb 9>; >>>>>>>>> >>>>>>>>> } >>>>>>>>> bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) >>>>>>>>> { >>>>>>>>> <bb 6>: >>>>>>>>> if (xmax_17 == xmax_27) >>>>>>>>> goto <bb 7>; >>>>>>>>> else >>>>>>>>> goto <bb 8>; >>>>>>>>> >>>>>>>>> } >>>>>>>>> bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) >>>>>>>>> { >>>>>>>>> <bb 7>: >>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>> goto <bb 11>; >>>>>>>>> >>>>>>>>> } >>>>>>>>> >>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out >>>>>>>>> restoring gsi in predicate_all_scalar_phi: >>>>>>>>> #if 0 >>>>>>>>> if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) >>>>>>>>> || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) >>>>>>>>> gsi = bb_insert_point (bb); >>>>>>>>> else >>>>>>>>> #endif >>>>>>>>> gsi = gsi_after_labels (bb); >>>>>>>>> >>>>>>>>> we will get ICE: >>>>>>>>> t5.c: In function 'foo': >>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use >>>>>>>>> void foo (int n) >>>>>>>>> ^ >>>>>>>>> for SSA_NAME: _1 in statement: >>>>>>>>> _52 = _1 & _3; >>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed >>>>>>>>> >>>>>>>>> smce predicate computations were inserted in bb_7. >>>>>>>> >>>>>>>> The issue is obviously that the predicates have already been emitted >>>>>>>> in the target BB - that's of course the wrong place. This is done >>>>>>>> by insert_gimplified_predicates. >>>>>>>> >>>>>>>> This just shows how edge predicate handling is broken - we don't >>>>>>>> seem to have a sequence of gimplified stmts for edge predicates >>>>>>>> but push those to e->dest which makes this really messy. >>>>>>>> >>>>>>>> Rather than having a separate phase where we insert all >>>>>>>> gimplified bb predicates we should do that on-demand when >>>>>>>> predicating a PHI. >>>>>>>> >>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard >>>>>>>> the printfs properly. >>>>>>>> >>>>>>>> You also still have two functions for PHI predication. And the >>>>>>>> new extended variant doesn't commonize the 2-args and general >>>>>>>> paths. >>>>>>>> >>>>>>>> I'm not at all happy with this code. It may be existing if-conv codes >>>>>>>> fault but making it even worse is not an option. >>>>>>>> >>>>>>>> Again - what's wrong with simply splitting critical edges if >>>>>>>> aggressive_if_conv? I think that would very much simplify >>>>>>>> things here. Or alternatively use gsi_insert_on_edge and >>>>>>>> commit edge insertions before merging the blocks. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Richard. >>>>>>>> >>>>>>>>> ChangeLog is >>>>>>>>> >>>>>>>>> 2014-12-09 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> >>>>>>>>> * tree-if-conv.c : Include hash-map.h. >>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple >>>>>>>>> statement iterator. >>>>>>>>> (bb_insert_point): New function. >>>>>>>>> (set_bb_insert_point): New function. >>>>>>>>> (has_pred_critical_p): New function. >>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>> AGGRESSIVE_IF_CONV is true. >>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>> non-critical incoming edge. >>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. >>>>>>>>> Allow interchange PHI arguments if EXTENDED is false. >>>>>>>>> Change check that block containing reduction statement candidate >>>>>>>>> is predecessor of phi-block since phi may have more than two >>>>>>>>> arguments. >>>>>>>>> (predicate_scalar_phi): Add new arguments for call of >>>>>>>>> is_cond_scalar_reduction. >>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>> (struct phi_args_hash_traits): New type. >>>>>>>>> (phi_args_hash_traits::hash): New function. >>>>>>>>> (phi_args_hash_traits::equal_keys): New function. >>>>>>>>> (gen_phi_arg_condition): New function. >>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it >>>>>>>>> to true if BB containing phi has more than 2 predecessors or both >>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and >>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB >>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more >>>>>>>>> than 2 predecessors and atleast one incoming edge is critical. >>>>>>>>> Use standard gsi_after_labels otherwise. >>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true. >>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION >>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to >>>>>>>>> true for BB with 2 predecessors and critical incoming edges either >>>>>>>>> number of predecessors is geater 2 and at least one incoming >>>>>>>>> edge is >>>>>>>>> critical. >>>>>>>>> Add check that non-predicated block may have statements to insert. >>>>>>>>> Insert predicate computation of BB just after label if >>>>>>>>> EXTENDED_PREDICATION is true. >>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which >>>>>>>>> is copy of inner or outer loop force_vectorize field. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener >>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>> wrote: >>>>>>>>>>> Richard, >>>>>>>>>>> >>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has >>>>>>>>>>> critical edges by adding additional field to bb_predicate_s: >>>>>>>>>>> >>>>>>>>>>> typedef struct bb_predicate_s { >>>>>>>>>>> >>>>>>>>>>> /* The condition under which this basic block is executed. */ >>>>>>>>>>> tree predicate; >>>>>>>>>>> >>>>>>>>>>> /* PREDICATE is gimplified, and the sequence of statements is >>>>>>>>>>> recorded here, in order to avoid the duplication of >>>>>>>>>>> computations >>>>>>>>>>> that occur in previous conditions. See PR44483. */ >>>>>>>>>>> gimple_seq predicate_gimplified_stmts; >>>>>>>>>>> >>>>>>>>>>> /* Insertion point for blocks having incoming critical edges. */ >>>>>>>>>>> gimple_stmt_iterator gsi; >>>>>>>>>>> } *bb_predicate_p; >>>>>>>>>>> >>>>>>>>>>> and this iterator is saved in insert_gimplified_predicates before >>>>>>>>>>> insertion code for predicate computation. I checked that this fix >>>>>>>>>>> works. >>>>>>>>>> >>>>>>>>>> Huh? I still wonder what the issue is with inserting everything >>>>>>>>>> after the PHI we predicate. >>>>>>>>>> >>>>>>>>>> Well, your updated patch will come with testcases for the testsuite >>>>>>>>>> that will hopefully fail if doing that. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Now I am implementing merging of predicate_extended.. and >>>>>>>>>>> predicate_arbitrary.. functions as you proposed. >>>>>>>>>>> >>>>>>>>>>> Best regards. >>>>>>>>>>> Yuri. >>>>>>>>>>> >>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener >>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev >>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>> Thanks Richard for your quick reply! >>>>>>>>>>>>> >>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and >>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed. >>>>>>>>>>>>> 2. What is your opinion about using more simple decision about >>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi >>>>>>>>>>>>> predication >>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge >>>>>>>>>>>>> splitting is not a good decision. >>>>>>>>>>>> >>>>>>>>>>>> Why not always insert before the use? Which would be after labels, >>>>>>>>>>>> what we do for two-arg PHIs. That is, how can it be that you >>>>>>>>>>>> predicate >>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming >>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself? That >>>>>>>>>>>> can only happen for backedges but those you can't remove in any >>>>>>>>>>>> case. >>>>>>>>>>>> >>>>>>>>>>>> Richard. >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards. >>>>>>>>>>>>> Yuri. >>>>>>>>>>>>> >>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener >>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev >>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>> Hi Richard, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes: >>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>>>>>>>>>>> 2. Use static cast for the first argument of >>>>>>>>>>>>>>> gimple_phi_arg_edge. >>>>>>>>>>>>>>> I also very sorry that I sent you bad patch. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Now let me answer on your questions related to second patch. >>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let's consider the following simple test-case: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>>>>> { >>>>>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>>>>>>>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>>>>>>>>>>> res += 1; >>>>>>>>>>>>>>> } >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> we can see the following phi node correspondent to res: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 >>>>>>>>>>>>>>> arguments only >>>>>>>>>>>>>>> and only one check can be used for phi predication (for >>>>>>>>>>>>>>> reduction in >>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't >>>>>>>>>>>>>>> do it >>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to >>>>>>>>>>>>>>> produce >>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see >>>>>>>>>>>>>>> comments >>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi). >>>>>>>>>>>>>> >>>>>>>>>>>>>> How so? We can always use !(condition) for the "last" value, >>>>>>>>>>>>>> thus >>>>>>>>>>>>>> treat it as an 'else' case. That even works for >>>>>>>>>>>>>> >>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> >>>>>>>>>>>>>> >>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as >>>>>>>>>>>>>> ! (condition for 3 || condition for 4). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first >>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion >>>>>>>>>>>>>> used for edges 3 and 4 combined. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point? >>>>>>>>>>>>>>> Let's consider another test-case extracted from 175.vpr ( t5.c >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes >>>>>>>>>>>>>>> has >>>>>>>>>>>>>>> only critical incoming edges and both contain code computing >>>>>>>>>>>>>>> edge >>>>>>>>>>>>>>> predicates, e.g. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> <bb 7>: >>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37; >>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27; >>>>>>>>>>>>>>> _48 = _46 & _47; >>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37; >>>>>>>>>>>>>>> _54 = ~_53; >>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27; >>>>>>>>>>>>>>> _56 = _54 & _55; >>>>>>>>>>>>>>> _57 = _48 | _56; >>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>>>>>>>> goto <bb 11>; >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block >>>>>>>>>>>>>>> beginning but need to put it after predicate computations. >>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments >>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi >>>>>>>>>>>>>>> result can >>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> block end. >>>>>>>>>>>>>> >>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does >>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible >>>>>>>>>>>>>> for critical edges unless you split them). >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to >>>>>>>>>>>>>> such issues, >>>>>>>>>>>>>> like splitting the edge! Certainly not involving a function >>>>>>>>>>>>>> walking >>>>>>>>>>>>>> GENERIC expressions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Let me know if you still have any questions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best regards. >>>>>>>>>>>>>>> Yuri. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener >>>>>>>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev >>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Here is the second patch related to extended predication. >>>>>>>>>>>>>>>>> Few comments which explain a main goal of design. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since >>>>>>>>>>>>>>>>> it may >>>>>>>>>>>>>>>>> lead to less efficient binaries. >>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was >>>>>>>>>>>>>>>>> introduced >>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are >>>>>>>>>>>>>>>>> different >>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI >>>>>>>>>>>>>>>>> conditional >>>>>>>>>>>>>>>>> scalar reduction is applied. >>>>>>>>>>>>>>>>> This is correspondent to the following statement: >>>>>>>>>>>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>>>>>>>>>>> New function phi_has_two_different_args was introduced to >>>>>>>>>>>>>>>>> detect such phi. >>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption >>>>>>>>>>>>>>>>> that at >>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not >>>>>>>>>>>>>>>>> critical - it >>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of >>>>>>>>>>>>>>>>> normal edge >>>>>>>>>>>>>>>>> are already inserted above this block and >>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the >>>>>>>>>>>>>>>>> beginning of >>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which >>>>>>>>>>>>>>>>> predicate >>>>>>>>>>>>>>>>> computations are in the block where code for phi predication >>>>>>>>>>>>>>>>> must be >>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced >>>>>>>>>>>>>>>>> which is >>>>>>>>>>>>>>>>> simply found out the last statement in block defining >>>>>>>>>>>>>>>>> predicates >>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi >>>>>>>>>>>>>>>>> predication code >>>>>>>>>>>>>>>>> after it (with some minor exceptions). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>>>>>>>>>>> predicate_extended_scalar_phi and >>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi. >>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after >>>>>>>>>>>>>>>> value >>>>>>>>>>>>>>>> and handle equal values specially in >>>>>>>>>>>>>>>> predicate_extended_scalar_phi? >>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point. All SSA >>>>>>>>>>>>>>>> names >>>>>>>>>>>>>>>> required for the predicates are defined upward - and the >>>>>>>>>>>>>>>> complex CFG >>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will >>>>>>>>>>>>>>>> dominate the >>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the >>>>>>>>>>>>>>>> other case. >>>>>>>>>>>>>>>> Or what am I missing? ("flattening" of the basic-blocks of >>>>>>>>>>>>>>>> course needs >>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens >>>>>>>>>>>>>>>> already?) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag >>>>>>>>>>>>>>>> even >>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args >>>>>>>>>>>>>>>> multiple >>>>>>>>>>>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do >>>>>>>>>>>>>>>> this as >>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag >>>>>>>>>>>>>>>> to something less looking like a flag, like simply >>>>>>>>>>>>>>>> 'aggressive'. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Richard. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> ChangeLog: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors >>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>>>>>>>>> non-critical incoming edge. >>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose >>>>>>>>>>>>>>>>> access >>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor >>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>>>>>>>>>>>>>>> statement before/after gsi point. >>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means >>>>>>>>>>>>>>>>> non-extended >>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument >>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>>>>>>>>>>> convert_scalar_cond_reduction. >>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables >>>>>>>>>>>>>>>>> EXTENDED and >>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has >>>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. >>>>>>>>>>>>>>>>> Invoke >>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi >>>>>>>>>>>>>>>>> depending on >>>>>>>>>>>>>>>>> EXTENDED value. >>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated >>>>>>>>>>>>>>>>> block >>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just >>>>>>>>>>>>>>>>> after label >>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of >>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE which >>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.