On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > Here is updated patch2 with the following changes: > 1. Delete functions phi_has_two_different_args and find_insertion_point. > 2. Use only one function for extended predication - > predicate_extended_scalar_phi. > 3. Save gsi before insertion of predicate computations for basic > blocks if it has 2 predecessors and > both incoming edges are critical or it gas more than 2 predecessors > and at least one incoming edge > is critical. This saved iterator can be used by extended phi predication. > > Here is motivated test-case which explains this point. > Test-case is attached (t5.c) and it must be compiled with -O2 > -ftree-loop-vectorize -fopenmp options. > The problem phi is in bb-7: > > bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) > { > <bb 5>: > xmax_edge_18 = xmax_edge_36 + 1; > if (xmax_17 == xmax_27) > goto <bb 7>; > else > goto <bb 9>; > > } > bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) > { > <bb 6>: > if (xmax_17 == xmax_27) > goto <bb 7>; > else > goto <bb 8>; > > } > bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) > { > <bb 7>: > # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> > xmax_edge_19 = xmax_edge_39 + 1; > goto <bb 11>; > > } > > Note that both incoming edges to bb_7 are critical. If we comment out > restoring gsi in predicate_all_scalar_phi: > #if 0 > if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) > || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) > gsi = bb_insert_point (bb); > else > #endif > gsi = gsi_after_labels (bb); > > we will get ICE: > t5.c: In function 'foo': > t5.c:9:6: error: definition in block 4 follows the use > void foo (int n) > ^ > for SSA_NAME: _1 in statement: > _52 = _1 & _3; > t5.c:9:6: internal compiler error: verify_ssa failed > > smce predicate computations were inserted in bb_7.
The issue is obviously that the predicates have already been emitted in the target BB - that's of course the wrong place. This is done by insert_gimplified_predicates. This just shows how edge predicate handling is broken - we don't seem to have a sequence of gimplified stmts for edge predicates but push those to e->dest which makes this really messy. Rather than having a separate phase where we insert all gimplified bb predicates we should do that on-demand when predicating a PHI. Your patch writes to stderr - that's bad - use dump_file and guard the printfs properly. You also still have two functions for PHI predication. And the new extended variant doesn't commonize the 2-args and general paths. I'm not at all happy with this code. It may be existing if-conv codes fault but making it even worse is not an option. Again - what's wrong with simply splitting critical edges if aggressive_if_conv? I think that would very much simplify things here. Or alternatively use gsi_insert_on_edge and commit edge insertions before merging the blocks. Thanks, Richard. > ChangeLog is > > 2014-12-09 Yuri Rumyantsev <ysrum...@gmail.com> > > * tree-if-conv.c : Include hash-map.h. > (struct bb_predicate_s): Add new field to save copy of gimple > statement iterator. > (bb_insert_point): New function. > (set_bb_insert_point): New function. > (has_pred_critical_p): New function. > (if_convertible_bb_p): Allow bb has more than 2 predecessors if > AGGRESSIVE_IF_CONV is true. > (if_convertible_bb_p): Delete check that bb has at least one > non-critical incoming edge. > (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. > Allow interchange PHI arguments if EXTENDED is false. > Change check that block containing reduction statement candidate > is predecessor of phi-block since phi may have more than two arguments. > (predicate_scalar_phi): Add new arguments for call of > is_cond_scalar_reduction. > (get_predicate_for_edge): New function. > (struct phi_args_hash_traits): New type. > (phi_args_hash_traits::hash): New function. > (phi_args_hash_traits::equal_keys): New function. > (gen_phi_arg_condition): New function. > (predicate_extended_scalar_phi): New function. > (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it > to true if BB containing phi has more than 2 predecessors or both > incoming edges are critical. Invoke find_phi_replacement_condition and > predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB > has 2 predecessors and both incoming edges are critical or it has more > than 2 predecessors and atleast one incoming edge is critical. > Use standard gsi_after_labels otherwise. > Invoke predicate_extended_scalar_phi if EXTENDED is true. > (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION > to save gsi before insertion of predicate computations. SEt-up it to > true for BB with 2 predecessors and critical incoming edges either > number of predecessors is geater 2 and at least one incoming edge is > critical. > Add check that non-predicated block may have statements to insert. > Insert predicate computation of BB just after label if > EXTENDED_PREDICATION is true. > (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which > is copy of inner or outer loop force_vectorize field. > > > > > 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> I did simple change by saving gsi iterator for each bb that has >>> critical edges by adding additional field to bb_predicate_s: >>> >>> typedef struct bb_predicate_s { >>> >>> /* The condition under which this basic block is executed. */ >>> tree predicate; >>> >>> /* PREDICATE is gimplified, and the sequence of statements is >>> recorded here, in order to avoid the duplication of computations >>> that occur in previous conditions. See PR44483. */ >>> gimple_seq predicate_gimplified_stmts; >>> >>> /* Insertion point for blocks having incoming critical edges. */ >>> gimple_stmt_iterator gsi; >>> } *bb_predicate_p; >>> >>> and this iterator is saved in insert_gimplified_predicates before >>> insertion code for predicate computation. I checked that this fix >>> works. >> >> Huh? I still wonder what the issue is with inserting everything >> after the PHI we predicate. >> >> Well, your updated patch will come with testcases for the testsuite >> that will hopefully fail if doing that. >> >> Richard. >> >>> >>> Now I am implementing merging of predicate_extended.. and >>> predicate_arbitrary.. functions as you proposed. >>> >>> Best regards. >>> Yuri. >>> >>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>>> Thanks Richard for your quick reply! >>>>> >>>>> 1. I agree that we can combine predicate_extended_ and >>>>> predicate_arbitrary_ to one function as you proposed. >>>>> 2. What is your opinion about using more simple decision about >>>>> insertion point - if bb has use of phi result insert phi predication >>>>> before it and at the bb end otherwise. I assume that critical edge >>>>> splitting is not a good decision. >>>> >>>> Why not always insert before the use? Which would be after labels, >>>> what we do for two-arg PHIs. That is, how can it be that you predicate >>>> a PHI in BB1 and then for an edge predicate on one of its incoming >>>> edges you get SSA uses with defs that are in BB1 itself? That >>>> can only happen for backedges but those you can't remove in any case. >>>> >>>> Richard. >>>> >>>>> >>>>> Best regards. >>>>> Yuri. >>>>> >>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Hi Richard, >>>>>>> >>>>>>> I resend you patch1 and patch2 with minor changes: >>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge. >>>>>>> I also very sorry that I sent you bad patch. >>>>>>> >>>>>>> Now let me answer on your questions related to second patch. >>>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>>> predicate_arbitrary_scalar_phi? >>>>>>> >>>>>>> Let's consider the following simple test-case: >>>>>>> >>>>>>> #pragma omp simd safelen(8) >>>>>>> for (i=0; i<512; i++) >>>>>>> { >>>>>>> float t = a[i]; >>>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>>> res += 1; >>>>>>> } >>>>>>> >>>>>>> we can see the following phi node correspondent to res: >>>>>>> >>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >>>>>>> >>>>>>> It is clear that we can optimize it to phi node with 2 arguments only >>>>>>> and only one check can be used for phi predication (for reduction in >>>>>>> our case), namely predicate of bb_5. In general case we can't do it >>>>>>> even if we sort all phi argument values since we still have to produce >>>>>>> a chain of cond expressions to perform phi predication (see comments >>>>>>> for predicate_arbitrary_scalar_phi). >>>>>> >>>>>> How so? We can always use !(condition) for the "last" value, thus >>>>>> treat it as an 'else' case. That even works for >>>>>> >>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> >>>>>> >>>>>> where the condition for edges 5 and 7 can be computed as >>>>>> ! (condition for 3 || condition for 4). >>>>>> >>>>>> Of course it is worthwhile to also sort single-occurances first >>>>>> so your case gets just the condiiton for edge 5 and its inversion >>>>>> used for edges 3 and 4 combined. >>>>>> >>>>>>> 2. Why we need to introduce find_insertion_point? >>>>>>> Let's consider another test-case extracted from 175.vpr ( t5.c is >>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has >>>>>>> only critical incoming edges and both contain code computing edge >>>>>>> predicates, e.g. >>>>>>> >>>>>>> <bb 7>: >>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>> _46 = xmax_17 == xmax_37; >>>>>>> _47 = xmax_17 == xmax_27; >>>>>>> _48 = _46 & _47; >>>>>>> _53 = xmax_17 == xmax_37; >>>>>>> _54 = ~_53; >>>>>>> _55 = xmax_17 == xmax_27; >>>>>>> _56 = _54 & _55; >>>>>>> _57 = _48 | _56; >>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>> goto <bb 11>; >>>>>>> >>>>>>> It is evident that we can not put phi predication at the block >>>>>>> beginning but need to put it after predicate computations. >>>>>>> Note also that if there are no critical edges for phi arguments >>>>>>> insertion point will be "after labels" Note also that phi result can >>>>>>> have use in this block too, so we can't put predication code to the >>>>>>> block end. >>>>>> >>>>>> So the issue is that predicate insertion for edge predicates does >>>>>> not happen on the edge but somewhere else (generally impossible >>>>>> for critical edges unless you split them). >>>>>> >>>>>> I think I've told you before that I prefer simple solutions to such >>>>>> issues, >>>>>> like splitting the edge! Certainly not involving a function walking >>>>>> GENERIC expressions. >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>>> Let me know if you still have any questions. >>>>>>> >>>>>>> Best regards. >>>>>>> Yuri. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> Here is the second patch related to extended predication. >>>>>>>>> Few comments which explain a main goal of design. >>>>>>>>> >>>>>>>>> 1. I don't want to insert any critical edge splitting since it may >>>>>>>>> lead to less efficient binaries. >>>>>>>>> 2. One special case of extended PHI node predication was introduced >>>>>>>>> when #arguments is more than 2 but only two arguments are different >>>>>>>>> and one argument has the only occurrence. For such PHI conditional >>>>>>>>> scalar reduction is applied. >>>>>>>>> This is correspondent to the following statement: >>>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>>> New function phi_has_two_different_args was introduced to detect >>>>>>>>> such phi. >>>>>>>>> 3. Original algorithm for PHI predication used assumption that at >>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it >>>>>>>>> guarantees that all computations related to predicate of normal edge >>>>>>>>> are already inserted above this block and >>>>>>>>> code related to PHI predication can be inserted at the beginning of >>>>>>>>> block. But this is not true for critical edges for which predicate >>>>>>>>> computations are in the block where code for phi predication must be >>>>>>>>> inserted. So new function find_insertion_point is introduced which is >>>>>>>>> simply found out the last statement in block defining predicates >>>>>>>>> correspondent to all incoming edges and insert phi predication code >>>>>>>>> after it (with some minor exceptions). >>>>>>>> >>>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>>> >>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>>> >>>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value >>>>>>>> and handle equal values specially in predicate_extended_scalar_phi? >>>>>>>> That would even make PHI <a, a, b, c, c> more optimal. >>>>>>>> >>>>>>>> I don't understand the need for find_insertion_point. All SSA names >>>>>>>> required for the predicates are defined upward - and the complex CFG >>>>>>>> is squashed to a single basic-block, thus the defs will dominate the >>>>>>>> inserted code if you insert after labels just like for the other case. >>>>>>>> Or what am I missing? ("flattening" of the basic-blocks of course >>>>>>>> needs >>>>>>>> to happen in dominator order - but I guess that happens already?) >>>>>>>> >>>>>>>> I'd like the extended PHI handling to be enablable by a flag even >>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple >>>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do this as >>>>>>>> followup, but please rename the local flag_force_vectorize flag >>>>>>>> to something less looking like a flag, like simply 'aggressive'. >>>>>>>> >>>>>>>> Otherwise patch 2 looks ok to me. >>>>>>>> >>>>>>>> Richard. >>>>>>>> >>>>>>>> >>>>>>>>> ChangeLog: >>>>>>>>> >>>>>>>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> >>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>> non-critical incoming edge. >>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access >>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>>> containing reduction statement candidate is predecessor >>>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>>>>>>> statement before/after gsi point. >>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended >>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument >>>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>>> convert_scalar_cond_reduction. >>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>> (find_insertion_point): New function. >>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and >>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more >>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke >>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on >>>>>>>>> EXTENDED value. >>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block >>>>>>>>> may have statements to insert. Insert predicate of BB just after label >>>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which >>>>>>>>> is copy of inner or outer loop field force_vectorize.