On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: > Richard, > > Sorry that I forgot to delete debug dump from my fix. > I have few questions about your comments. > > 1. You wrote : >> You also still have two functions for PHI predication. And the >> new extended variant doesn't commonize the 2-args and general >> path > Did you mean that I must combine predicate_scalar_phi and > predicate_extended scalar phi to one function? > Please note that if additional flag was not set up (i.e. > aggressive_if_conv is false) extended predication is required more > compile time since it builds hash_map.
It's compile-time complexity is reasonable enough even for non-aggressive if-conversion. > 2. About critical edge splitting. > > Did you mean that we should perform it (1) under aggressive_if_conv > option only; (2) should we split all critical edges. > Note that this leads to recomputing of topological order. Well, I don't mind splitting all critical edges unconditionally, thus do something like Index: gcc/tree-if-conv.c =================================================================== --- gcc/tree-if-conv.c (revision 218515) +++ gcc/tree-if-conv.c (working copy) @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f if (number_of_loops (fun) <= 1) return 0; + bool critical_edges_split_p = false; FOR_EACH_LOOP (loop, 0) if (flag_tree_loop_if_convert == 1 || flag_tree_loop_if_convert_stores == 1 || ((flag_tree_loop_vectorize || loop->force_vectorize) && !loop->dont_vectorize)) - todo |= tree_if_conversion (loop); + { + if (!critical_edges_split_p) + { + split_critical_edges (); + critical_edges_split_p = true; + todo |= TODO_cleanup_cfg; + } + todo |= tree_if_conversion (loop); + } #ifdef ENABLE_CHECKING { > It is worth noting that in current implementation bb's with 2 > predecessors and both are on critical edges are accepted without > additional option. Yes, I know. tree-if-conv.c is a mess right now and if we can avoid adding more to it and even fix the critical edge missed optimization with splitting critical edges then I am all for that solution. Richard. > Thanks ahead. > Yuri. > 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> Here is updated patch2 with the following changes: >>> 1. Delete functions phi_has_two_different_args and find_insertion_point. >>> 2. Use only one function for extended predication - >>> predicate_extended_scalar_phi. >>> 3. Save gsi before insertion of predicate computations for basic >>> blocks if it has 2 predecessors and >>> both incoming edges are critical or it gas more than 2 predecessors >>> and at least one incoming edge >>> is critical. This saved iterator can be used by extended phi predication. >>> >>> Here is motivated test-case which explains this point. >>> Test-case is attached (t5.c) and it must be compiled with -O2 >>> -ftree-loop-vectorize -fopenmp options. >>> The problem phi is in bb-7: >>> >>> bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 }) >>> { >>> <bb 5>: >>> xmax_edge_18 = xmax_edge_36 + 1; >>> if (xmax_17 == xmax_27) >>> goto <bb 7>; >>> else >>> goto <bb 9>; >>> >>> } >>> bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 }) >>> { >>> <bb 6>: >>> if (xmax_17 == xmax_27) >>> goto <bb 7>; >>> else >>> goto <bb 8>; >>> >>> } >>> bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 }) >>> { >>> <bb 7>: >>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>> xmax_edge_19 = xmax_edge_39 + 1; >>> goto <bb 11>; >>> >>> } >>> >>> Note that both incoming edges to bb_7 are critical. If we comment out >>> restoring gsi in predicate_all_scalar_phi: >>> #if 0 >>> if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb)) >>> || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb))) >>> gsi = bb_insert_point (bb); >>> else >>> #endif >>> gsi = gsi_after_labels (bb); >>> >>> we will get ICE: >>> t5.c: In function 'foo': >>> t5.c:9:6: error: definition in block 4 follows the use >>> void foo (int n) >>> ^ >>> for SSA_NAME: _1 in statement: >>> _52 = _1 & _3; >>> t5.c:9:6: internal compiler error: verify_ssa failed >>> >>> smce predicate computations were inserted in bb_7. >> >> The issue is obviously that the predicates have already been emitted >> in the target BB - that's of course the wrong place. This is done >> by insert_gimplified_predicates. >> >> This just shows how edge predicate handling is broken - we don't >> seem to have a sequence of gimplified stmts for edge predicates >> but push those to e->dest which makes this really messy. >> >> Rather than having a separate phase where we insert all >> gimplified bb predicates we should do that on-demand when >> predicating a PHI. >> >> Your patch writes to stderr - that's bad - use dump_file and guard >> the printfs properly. >> >> You also still have two functions for PHI predication. And the >> new extended variant doesn't commonize the 2-args and general >> paths. >> >> I'm not at all happy with this code. It may be existing if-conv codes >> fault but making it even worse is not an option. >> >> Again - what's wrong with simply splitting critical edges if >> aggressive_if_conv? I think that would very much simplify >> things here. Or alternatively use gsi_insert_on_edge and >> commit edge insertions before merging the blocks. >> >> Thanks, >> Richard. >> >>> ChangeLog is >>> >>> 2014-12-09 Yuri Rumyantsev <ysrum...@gmail.com> >>> >>> * tree-if-conv.c : Include hash-map.h. >>> (struct bb_predicate_s): Add new field to save copy of gimple >>> statement iterator. >>> (bb_insert_point): New function. >>> (set_bb_insert_point): New function. >>> (has_pred_critical_p): New function. >>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>> AGGRESSIVE_IF_CONV is true. >>> (if_convertible_bb_p): Delete check that bb has at least one >>> non-critical incoming edge. >>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED. >>> Allow interchange PHI arguments if EXTENDED is false. >>> Change check that block containing reduction statement candidate >>> is predecessor of phi-block since phi may have more than two arguments. >>> (predicate_scalar_phi): Add new arguments for call of >>> is_cond_scalar_reduction. >>> (get_predicate_for_edge): New function. >>> (struct phi_args_hash_traits): New type. >>> (phi_args_hash_traits::hash): New function. >>> (phi_args_hash_traits::equal_keys): New function. >>> (gen_phi_arg_condition): New function. >>> (predicate_extended_scalar_phi): New function. >>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it >>> to true if BB containing phi has more than 2 predecessors or both >>> incoming edges are critical. Invoke find_phi_replacement_condition and >>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB >>> has 2 predecessors and both incoming edges are critical or it has more >>> than 2 predecessors and atleast one incoming edge is critical. >>> Use standard gsi_after_labels otherwise. >>> Invoke predicate_extended_scalar_phi if EXTENDED is true. >>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION >>> to save gsi before insertion of predicate computations. SEt-up it to >>> true for BB with 2 predecessors and critical incoming edges either >>> number of predecessors is geater 2 and at least one incoming edge is >>> critical. >>> Add check that non-predicated block may have statements to insert. >>> Insert predicate computation of BB just after label if >>> EXTENDED_PREDICATION is true. >>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which >>> is copy of inner or outer loop force_vectorize field. >>> >>> >>> >>> >>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>>>> Richard, >>>>> >>>>> I did simple change by saving gsi iterator for each bb that has >>>>> critical edges by adding additional field to bb_predicate_s: >>>>> >>>>> typedef struct bb_predicate_s { >>>>> >>>>> /* The condition under which this basic block is executed. */ >>>>> tree predicate; >>>>> >>>>> /* PREDICATE is gimplified, and the sequence of statements is >>>>> recorded here, in order to avoid the duplication of computations >>>>> that occur in previous conditions. See PR44483. */ >>>>> gimple_seq predicate_gimplified_stmts; >>>>> >>>>> /* Insertion point for blocks having incoming critical edges. */ >>>>> gimple_stmt_iterator gsi; >>>>> } *bb_predicate_p; >>>>> >>>>> and this iterator is saved in insert_gimplified_predicates before >>>>> insertion code for predicate computation. I checked that this fix >>>>> works. >>>> >>>> Huh? I still wonder what the issue is with inserting everything >>>> after the PHI we predicate. >>>> >>>> Well, your updated patch will come with testcases for the testsuite >>>> that will hopefully fail if doing that. >>>> >>>> Richard. >>>> >>>>> >>>>> Now I am implementing merging of predicate_extended.. and >>>>> predicate_arbitrary.. functions as you proposed. >>>>> >>>>> Best regards. >>>>> Yuri. >>>>> >>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Thanks Richard for your quick reply! >>>>>>> >>>>>>> 1. I agree that we can combine predicate_extended_ and >>>>>>> predicate_arbitrary_ to one function as you proposed. >>>>>>> 2. What is your opinion about using more simple decision about >>>>>>> insertion point - if bb has use of phi result insert phi predication >>>>>>> before it and at the bb end otherwise. I assume that critical edge >>>>>>> splitting is not a good decision. >>>>>> >>>>>> Why not always insert before the use? Which would be after labels, >>>>>> what we do for two-arg PHIs. That is, how can it be that you predicate >>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming >>>>>> edges you get SSA uses with defs that are in BB1 itself? That >>>>>> can only happen for backedges but those you can't remove in any case. >>>>>> >>>>>> Richard. >>>>>> >>>>>>> >>>>>>> Best regards. >>>>>>> Yuri. >>>>>>> >>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> wrote: >>>>>>>>> Hi Richard, >>>>>>>>> >>>>>>>>> I resend you patch1 and patch2 with minor changes: >>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv. >>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge. >>>>>>>>> I also very sorry that I sent you bad patch. >>>>>>>>> >>>>>>>>> Now let me answer on your questions related to second patch. >>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and >>>>>>>>> predicate_arbitrary_scalar_phi? >>>>>>>>> >>>>>>>>> Let's consider the following simple test-case: >>>>>>>>> >>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>> for (i=0; i<512; i++) >>>>>>>>> { >>>>>>>>> float t = a[i]; >>>>>>>>> if (t > 0.0f & t < 1.0e+17f) >>>>>>>>> if (c[i] != 0) /* c is integer array. */ >>>>>>>>> res += 1; >>>>>>>>> } >>>>>>>>> >>>>>>>>> we can see the following phi node correspondent to res: >>>>>>>>> >>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)> >>>>>>>>> >>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only >>>>>>>>> and only one check can be used for phi predication (for reduction in >>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it >>>>>>>>> even if we sort all phi argument values since we still have to produce >>>>>>>>> a chain of cond expressions to perform phi predication (see comments >>>>>>>>> for predicate_arbitrary_scalar_phi). >>>>>>>> >>>>>>>> How so? We can always use !(condition) for the "last" value, thus >>>>>>>> treat it as an 'else' case. That even works for >>>>>>>> >>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)> >>>>>>>> >>>>>>>> where the condition for edges 5 and 7 can be computed as >>>>>>>> ! (condition for 3 || condition for 4). >>>>>>>> >>>>>>>> Of course it is worthwhile to also sort single-occurances first >>>>>>>> so your case gets just the condiiton for edge 5 and its inversion >>>>>>>> used for edges 3 and 4 combined. >>>>>>>> >>>>>>>>> 2. Why we need to introduce find_insertion_point? >>>>>>>>> Let's consider another test-case extracted from 175.vpr ( t5.c is >>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has >>>>>>>>> only critical incoming edges and both contain code computing edge >>>>>>>>> predicates, e.g. >>>>>>>>> >>>>>>>>> <bb 7>: >>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)> >>>>>>>>> _46 = xmax_17 == xmax_37; >>>>>>>>> _47 = xmax_17 == xmax_27; >>>>>>>>> _48 = _46 & _47; >>>>>>>>> _53 = xmax_17 == xmax_37; >>>>>>>>> _54 = ~_53; >>>>>>>>> _55 = xmax_17 == xmax_27; >>>>>>>>> _56 = _54 & _55; >>>>>>>>> _57 = _48 | _56; >>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1; >>>>>>>>> goto <bb 11>; >>>>>>>>> >>>>>>>>> It is evident that we can not put phi predication at the block >>>>>>>>> beginning but need to put it after predicate computations. >>>>>>>>> Note also that if there are no critical edges for phi arguments >>>>>>>>> insertion point will be "after labels" Note also that phi result can >>>>>>>>> have use in this block too, so we can't put predication code to the >>>>>>>>> block end. >>>>>>>> >>>>>>>> So the issue is that predicate insertion for edge predicates does >>>>>>>> not happen on the edge but somewhere else (generally impossible >>>>>>>> for critical edges unless you split them). >>>>>>>> >>>>>>>> I think I've told you before that I prefer simple solutions to such >>>>>>>> issues, >>>>>>>> like splitting the edge! Certainly not involving a function walking >>>>>>>> GENERIC expressions. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Richard. >>>>>>>> >>>>>>>>> Let me know if you still have any questions. >>>>>>>>> >>>>>>>>> Best regards. >>>>>>>>> Yuri. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener >>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev >>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>> Hi All, >>>>>>>>>>> >>>>>>>>>>> Here is the second patch related to extended predication. >>>>>>>>>>> Few comments which explain a main goal of design. >>>>>>>>>>> >>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may >>>>>>>>>>> lead to less efficient binaries. >>>>>>>>>>> 2. One special case of extended PHI node predication was introduced >>>>>>>>>>> when #arguments is more than 2 but only two arguments are different >>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional >>>>>>>>>>> scalar reduction is applied. >>>>>>>>>>> This is correspondent to the following statement: >>>>>>>>>>> if (q1 && q2 && q3) var++ >>>>>>>>>>> New function phi_has_two_different_args was introduced to detect >>>>>>>>>>> such phi. >>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at >>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - >>>>>>>>>>> it >>>>>>>>>>> guarantees that all computations related to predicate of normal edge >>>>>>>>>>> are already inserted above this block and >>>>>>>>>>> code related to PHI predication can be inserted at the beginning of >>>>>>>>>>> block. But this is not true for critical edges for which predicate >>>>>>>>>>> computations are in the block where code for phi predication must >>>>>>>>>>> be >>>>>>>>>>> inserted. So new function find_insertion_point is introduced which >>>>>>>>>>> is >>>>>>>>>>> simply found out the last statement in block defining predicates >>>>>>>>>>> correspondent to all incoming edges and insert phi predication code >>>>>>>>>>> after it (with some minor exceptions). >>>>>>>>>> >>>>>>>>>> Unfortunately the patch doesn't apply for me - I get >>>>>>>>>> >>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@ >>>>>>>>>> predicate_all_scalar_phis (struct loop *loop) >>>>>>>>>> >>>>>>>>>> a few remarks nevertheless. I don't see how we need both >>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi. >>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value >>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi? >>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal. >>>>>>>>>> >>>>>>>>>> I don't understand the need for find_insertion_point. All SSA names >>>>>>>>>> required for the predicates are defined upward - and the complex CFG >>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the >>>>>>>>>> inserted code if you insert after labels just like for the other >>>>>>>>>> case. >>>>>>>>>> Or what am I missing? ("flattening" of the basic-blocks of course >>>>>>>>>> needs >>>>>>>>>> to happen in dominator order - but I guess that happens already?) >>>>>>>>>> >>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even >>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple >>>>>>>>>> times that would have been nice to vectorize. I suggest to >>>>>>>>>> add -ftree-loop-if-convert-aggressive for this. We can do this as >>>>>>>>>> followup, but please rename the local flag_force_vectorize flag >>>>>>>>>> to something less looking like a flag, like simply 'aggressive'. >>>>>>>>>> >>>>>>>>>> Otherwise patch 2 looks ok to me. >>>>>>>>>> >>>>>>>>>> Richard. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> ChangeLog: >>>>>>>>>>> >>>>>>>>>>> 2014-10-24 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>> >>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use >>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag. >>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if >>>>>>>>>>> FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one >>>>>>>>>>> non-critical incoming edge. >>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access >>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi >>>>>>>>>>> arguments if EXTENDED is true. Change check that block >>>>>>>>>>> containing reduction statement candidate is predecessor >>>>>>>>>>> of phi-block since phi may have more than two arguments. >>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert >>>>>>>>>>> statement before/after gsi point. >>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended >>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument >>>>>>>>>>> true (which correspondent to argument BEFORE) to call of >>>>>>>>>>> convert_scalar_cond_reduction. >>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and >>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more >>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke >>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or >>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on >>>>>>>>>>> EXTENDED value. >>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block >>>>>>>>>>> may have statements to insert. Insert predicate of BB just after >>>>>>>>>>> label >>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true. >>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE >>>>>>>>>>> which >>>>>>>>>>> is copy of inner or outer loop field force_vectorize.