Richard, I did simple change by saving gsi iterator for each bb that has critical edges by adding additional field to bb_predicate_s:
typedef struct bb_predicate_s { /* The condition under which this basic block is executed. */ tree predicate; /* PREDICATE is gimplified, and the sequence of statements is recorded here, in order to avoid the duplication of computations that occur in previous conditions. See PR44483. */ gimple_seq predicate_gimplified_stmts; /* Insertion point for blocks having incoming critical edges. */ gimple_stmt_iterator gsi; } *bb_predicate_p; and this iterator is saved in insert_gimplified_predicates before insertion code for predicate computation. I checked that this fix works. Now I am implementing merging of predicate_extended.. and predicate_arbitrary.. functions as you proposed. Best regards. Yuri. 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: > On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Thanks Richard for your quick reply! >> >> 1. I agree that we can combine predicate_extended_ and >> predicate_arbitrary_ to one function as you proposed. >> 2. What is your opinion about using more simple decision about >> insertion point - if bb has use of phi result insert phi predication >> before it and at the bb end otherwise. I assume that critical edge >> splitting is not a good decision. > > Why not always insert before the use? Which would be after labels, > what we do for two-arg PHIs. That is, how can it be that you predicate > a PHI in BB1 and then for an edge predicate on one of its incoming > edges you get SSA uses with defs that are in BB1 itself? That > can only happen for backedges but those you can't remove in any case. > > Richard. > >> >> Best regards. >> Yuri. >> >> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>> Hi Richard, >>>> >>>> I resend you patch1 and patch2 with minor changes: >>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>> 2. Use static cast for the first argument of gimple_phi_arg_edge. >>>> I also very sorry that I sent you bad patch. >>>> >>>> Now let me answer on your questions related to second patch. >>>> 1. Why we need both predicate_extended_scalar_phi and >>>> predicate_arbitrary_scalar_phi? >>>> >>>> Let's consider the following simple test-case: >>>> >>>> #pragma omp simd safelen(8) >>>> for (i=0; i<512; i++) >>>> { >>>> float t = a[i]; >>>> if (t > 0.0f & t < 1.0e+17f) >>>> if (c[i] != 0) /* c is integer array. */ >>>> res += 1; >>>> } >>>> >>>> we can see the following phi node correspondent to res: >>>> >>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >>>> >>>> It is clear that we can optimize it to phi node with 2 arguments only >>>> and only one check can be used for phi predication (for reduction in >>>> our case), namely predicate of bb_5. In general case we can't do it >>>> even if we sort all phi argument values since we still have to produce >>>> a chain of cond expressions to perform phi predication (see comments >>>> for predicate_arbitrary_scalar_phi). >>> >>> How so? We can always use !(condition) for the "last" value, thus >>> treat it as an 'else' case. That even works for >>> >>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> >>> >>> where the condition for edges 5 and 7 can be computed as >>> ! (condition for 3 || condition for 4). >>> >>> Of course it is worthwhile to also sort single-occurances first >>> so your case gets just the condiiton for edge 5 and its inversion >>> used for edges 3 and 4 combined. >>> >>>> 2. Why we need to introduce find_insertion_point? >>>> Let's consider another test-case extracted from 175.vpr ( t5.c is >>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has >>>> only critical incoming edges and both contain code computing edge >>>> predicates, e.g. >>>> >>>> <bb 7>: >>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>> _46 = xmax_17 == xmax_37; >>>> _47 = xmax_17 == xmax_27; >>>> _48 = _46 & _47; >>>> _53 = xmax_17 == xmax_37; >>>> _54 = ~_53; >>>> _55 = xmax_17 == xmax_27; >>>> _56 = _54 & _55; >>>> _57 = _48 | _56; >>>> xmax_edge_19 = xmax_edge_39 + 1; >>>> goto <bb 11>; >>>> >>>> It is evident that we can not put phi predication at the block >>>> beginning but need to put it after predicate computations. >>>> Note also that if there are no critical edges for phi arguments >>>> insertion point will be "after labels" Note also that phi result can >>>> have use in this block too, so we can't put predication code to the >>>> block end. >>> >>> So the issue is that predicate insertion for edge predicates does >>> not happen on the edge but somewhere else (generally impossible >>> for critical edges unless you split them). >>> >>> I think I've told you before that I prefer simple solutions to such issues, >>> like splitting the edge! Certainly not involving a function walking >>> GENERIC expressions. >>> >>> Thanks, >>> Richard. >>> >>>> Let me know if you still have any questions. >>>> >>>> Best regards. >>>> Yuri. >>>> >>>> >>>> >>>> >>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>> wrote: >>>>>> Hi All, >>>>>> >>>>>> Here is the second patch related to extended predication. >>>>>> Few comments which explain a main goal of design. >>>>>> >>>>>> 1. I don't want to insert any critical edge splitting since it may >>>>>> lead to less efficient binaries. >>>>>> 2. One special case of extended PHI node predication was introduced >>>>>> when #arguments is more than 2 but only two arguments are different >>>>>> and one argument has the only occurrence. For such PHI conditional >>>>>> scalar reduction is applied. >>>>>> This is correspondent to the following statement: >>>>>> if (q1 && q2 && q3) var++ >>>>>> New function phi_has_two_different_args was introduced to detect such >>>>>> phi. >>>>>> 3. Original algorithm for PHI predication used assumption that at >>>>>> least one incoming edge for blocks containing PHI is not critical - it >>>>>> guarantees that all computations related to predicate of normal edge >>>>>> are already inserted above this block and >>>>>> code related to PHI predication can be inserted at the beginning of >>>>>> block. But this is not true for critical edges for which predicate >>>>>> computations are in the block where code for phi predication must be >>>>>> inserted. So new function find_insertion_point is introduced which is >>>>>> simply found out the last statement in block defining predicates >>>>>> correspondent to all incoming edges and insert phi predication code >>>>>> after it (with some minor exceptions). >>>>> >>>>> Unfortunately the patch doesn't apply for me - I get >>>>> >>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>>>> predicate_all_scalar_phis (struct loop *loop) >>>>> >>>>> a few remarks nevertheless. I don't see how we need both >>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >>>>> Couldn't we simply sort an array of (edge, value) pairs after value >>>>> and handle equal values specially in predicate_extended_scalar_phi? >>>>> That would even make PHI <a, a, b, c, c> more optimal. >>>>> >>>>> I don't understand the need for find_insertion_point. All SSA names >>>>> required for the predicates are defined upward - and the complex CFG >>>>> is squashed to a single basic-block, thus the defs will dominate the >>>>> inserted code if you insert after labels just like for the other case. >>>>> Or what am I missing? ("flattening" of the basic-blocks of course needs >>>>> to happen in dominator order - but I guess that happens already?) >>>>> >>>>> I'd like the extended PHI handling to be enablable by a flag even >>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple >>>>> times that would have been nice to vectorize. I suggest to >>>>> add -ftree-loop-if-convert-aggressive for this. We can do this as >>>>> followup, but please rename the local flag_force_vectorize flag >>>>> to something less looking like a flag, like simply 'aggressive'. >>>>> >>>>> Otherwise patch 2 looks ok to me. >>>>> >>>>> Richard. >>>>> >>>>> >>>>>> ChangeLog: >>>>>> >>>>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> >>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>> non-critical incoming edge. >>>>>> (phi_has_two_different_args): New function. >>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access >>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>>>> arguments if EXTENDED is true. Change check that block >>>>>> containing reduction statement candidate is predecessor >>>>>> of phi-block since phi may have more than two arguments. >>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>>>> statement before/after gsi point. >>>>>> (predicate_scalar_phi): Add argument false (which means non-extended >>>>>> predication) to call of is_cond_scalar_reduction. Add argument >>>>>> true (which correspondent to argument BEFORE) to call of >>>>>> convert_scalar_cond_reduction. >>>>>> (get_predicate_for_edge): New function. >>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>> (predicate_extended_scalar_phi): New function. >>>>>> (find_insertion_point): New function. >>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and >>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more >>>>>> than 2 predecessors or both incoming edges are critical. Invoke >>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>> find_insertion_point and predicate_extended_scalar_phi depending on >>>>>> EXTENDED value. >>>>>> (insert_gimplified_predicates): Add check that non-predicated block >>>>>> may have statements to insert. Insert predicate of BB just after label >>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which >>>>>> is copy of inner or outer loop field force_vectorize.