Richard, I did some changes in patch and ChangeLog to mark that support for if-convert of blocks with only critical incoming edges will be added in the future (more precise in patch.4).
Could you please review it. Thanks. ChangeLog: 2014-10-21 Yuri Rumyantsev <ysrum...@gmail.com> (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list if destination block of edge is not always executed. Set-up predicate for critical edge. (if_convertible_phi_p): Accept phi nodes with more than two args if FLAG_FORCE_VECTORIZE was set-up. (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. (if_convertible_stmt_p): Fix up pre-function comments. (all_preds_critical_p): New function. (if_convertible_bb_p): Use call of all_preds_critical_p to reject temporarily block if-conversion with incoming critical edges if FLAG_FORCE_VECTORIZE was not set-up. This restriction will be deleted after adding support for extended predication. (predicate_bbs): Skip loop exit block also.Invoke build2_loc to compute predicate instead of fold_build2_loc. Add zeroing of edge 'aux' field. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and at least one incoming edge is not critical original algorithm is used. (tree_if_conversion): Temporarily set-up FLAG_FORCE_VECTORIZE to false. Nullify 'aux' field of edges for blocks with two successors. 2014-10-20 17:55 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: > Richard, > > Thanks for your answer! > > In current implementation phi node conversion assume that one of > incoming edge to bb containing given phi has at least one non-critical > edge and choose it to insert predicated code. But if we choose > critical edge we need to determine insert point and insertion > direction (before/after) since in other case we can get invalid ssa > form (use before def). This is done by my new function which is not in > current patch ( I will present this patch later). SO I assume that we > need to leave this patch as it is to not introduce new bugs. > > Thanks. > Yuri. > > 2014-10-20 12:00 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >> On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >>> Richard, >>> >>> I reworked the patch as you proposed, but I didn't understand what >>> did you mean by: >>> >>>>So please rework the patch so critical edges are always handled >>>>correctly. >>> >>> In current patch flag_force_vectorize is used (1) to reject phi nodes >>> with more than 2 arguments; (2) to reject basic blocks with only >>> critical incoming edges since support for extended predication of phi >>> nodes will be in next patch. >> >> I mean that (2) should not be rejected dependent on flag_force_vectorize. >> It was rejected because if-cvt couldn't handle it correctly before but with >> this patch this is fixed. I see no reason to still reject this then even >> for !flag_force_vectorize. >> >> Rejecting PHIs with more than two arguments with flag_force_vectorize >> is ok. >> >> Richard. >> >>> Could you please clarify your statement. >>> >>> I attached modified patch. >>> >>> ChangeLog: >>> >>> 2014-10-17 Yuri Rumyantsev <ysrum...@gmail.com> >>> >>> (flag_force_vectorize): New variable. >>> (edge_predicate): New function. >>> (set_edge_predicate): New function. >>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list >>> if destination block of edge is not always executed. Set-up predicate >>> for critical edge. >>> (if_convertible_phi_p): Accept phi nodes with more than two args >>> if FLAG_FORCE_VECTORIZE was set-up. >>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>> (if_convertible_stmt_p): Fix up pre-function comments. >>> (all_edges_are_critical): New function. >>> (if_convertible_bb_p): Use call of all_preds_critical_p >>> to reject block if-conversion with incoming critical edges only if >>> FLAG_FORCE_VECTORIZE was not set-up. >>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>> to compute predicate instead of fold_build2_loc. >>> Add zeroing of edge 'aux' field. >>> (find_phi_replacement_condition): Extend function interface: >>> it returns NULL if given phi node must be handled by means of >>> extended phi node predication. If number of predecessors of phi-block >>> is equal 2 and atleast one incoming edge is not critical original >>> algorithm is used. >>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. >>> Nullify 'aux' field of edges for blocks with two successors. >>> >>> >>> >>> >>> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>> wrote: >>>>> Richard, >>>>> >>>>> Here is reduced patch as you requested. All your remarks have been fixed. >>>>> Could you please look at it ( I have already sent the patch with >>>>> changes in add_to_predicate_list for review). >>>> >>>> + if (dump_file && (dump_flags & TDF_DETAILS)) >>>> + fprintf (dump_file, "More than two phi node args.\n"); >>>> + return false; >>>> + } >>>> + >>>> + } >>>> >>>> Excess vertical space. >>>> >>>> >>>> +/* Assumes that BB has more than 2 predecessors. >>>> >>>> More than 1 predecessor? >>>> >>>> + Returns false if at least one successor is not on critical edge >>>> + and true otherwise. */ >>>> + >>>> +static inline bool >>>> +all_edges_are_critical (basic_block bb) >>>> +{ >>>> >>>> "all_preds_critical_p" would be a better name >>>> >>>> + if (EDGE_COUNT (bb->preds) > 2) >>>> + { >>>> + if (!flag_force_vectorize) >>>> + return false; >>>> + } >>>> >>>> as I said in the last review I don't think we should restrict edge >>>> predicates to flag_force_vectorize. At least I can't see how >>>> if-conversion is magically more expensive for that case? >>>> >>>> So please rework the patch so critical edges are always handled >>>> correctly. >>>> >>>> Ok with that and the above suggested changes. >>>> >>>> Thanks, >>>> Richard. >>>> >>>> >>>>> Thanks. >>>>> Yuri. >>>>> ChangeLog >>>>> 2014-10-16 Yuri Rumyantsev <ysrum...@gmail.com> >>>>> >>>>> (flag_force_vectorize): New variable. >>>>> (edge_predicate): New function. >>>>> (set_edge_predicate): New function. >>>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list >>>>> if destination block of edge is not always executed. Set-up predicate >>>>> for critical edge. >>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>> (all_edges_are_critical): New function. >>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>>> to reject block if-conversion with incoming critical edges only if >>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc >>>>> to compute predicate instead of fold_build2_loc. >>>>> Add zeroing of edge 'aux' field. >>>>> (find_phi_replacement_condition): Extend function interface: >>>>> it returns NULL if given phi node must be handled by means of >>>>> extended phi node predication. If number of predecessors of phi-block >>>>> is equal 2 and atleast one incoming edge is not critical original >>>>> algorithm is used. >>>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false. >>>>> Nullify 'aux' field of edges for blocks with two successors. >>>>> >>>>> >>>>> >>>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>> wrote: >>>>>>> Richard, >>>>>>> >>>>>>> Here is updated patch (part1) for extended if conversion. >>>>>>> >>>>>>> Second part of patch will be sent later. >>>>>> >>>>>> Ok, I'm starting to look at this. I'd still like you to split things up >>>>>> more. >>>>>> >>>>>> static inline void >>>>>> add_to_predicate_list (struct loop *loop, basic_block bb, tree nc) >>>>>> { >>>>>> ... >>>>>> >>>>>> + /* We use notion of cd equivalence to get simplier predicate for >>>>>> + join block, e.g. if join block has 2 predecessors with >>>>>> predicates >>>>>> + p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of >>>>>> + p1 & p2 | p1 & !p2. */ >>>>>> + if (dom_bb != loop->header >>>>>> + && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb) >>>>>> + { >>>>>> + gcc_assert (flow_bb_inside_loop_p (loop, dom_bb)); >>>>>> + bc = bb_predicate (dom_bb); >>>>>> + gcc_assert (!is_true_predicate (bc)); >>>>>> >>>>>> these changes look worthwhile even for !flag_force_vectorize. So please >>>>>> split the change to add_to_predicate_list out and compute post-dominators >>>>>> unconditionally. Note that you should call free_dominance_info >>>>>> (CDI_POST_DOMINATORS) at the end of if-conversion. >>>>>> >>>>>> + if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest)) >>>>>> + add_to_predicate_list (loop, e->dest, cond); >>>>>> + >>>>>> + /* If edge E is critical save predicate on it. */ >>>>>> + if (EDGE_COUNT (e->dest->preds) >= 2) >>>>>> + set_edge_predicate (e, cond); >>>>>> >>>>>> how do we know the edge is critical by this simple check? Why not >>>>>> simply always save edge predicates (well, you kind of do but omit >>>>>> the case where e->src dominates e->dest). >>>>>> >>>>>> Btw, you can rely on edge->aux being NULL at the start of the >>>>>> pass but need to clear it at the end (best use clear_aux_for_edges () >>>>>> for that). So stuff like >>>>>> >>>>>> + extract_true_false_edges_from_block (bb, &true_edge, >>>>>> &false_edge); >>>>>> + if (flag_force_vectorize) >>>>>> + true_edge->aux = false_edge->aux = NULL; >>>>>> >>>>>> shouldn't be necessary. >>>>>> >>>>>> I think the edge predicate handling should also be unconditionally >>>>>> and not depend on flag_force_vectorize. >>>>>> >>>>>> + /* The loop latch and loop exit block are always executed and >>>>>> + have no extra conditions to be processed: skip them. */ >>>>>> + if (bb == loop->latch >>>>>> + || bb_with_exit_edge_p (loop, bb)) >>>>>> >>>>>> I don't think the edge stuff is true - given you still only reset the >>>>>> loop->latch bb predicate the change looks broken. >>>>>> >>>>>> + /* Fold_build2 can produce bool conversion which is not >>>>>> + supported by vectorizer, so re-build it without folding. >>>>>> + For example, such conversion is generated for sequence: >>>>>> + _Bool _7, _8, _9; >>>>>> + _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9; >>>>>> + if (_9 != 0) --> (bool)_9. */ >>>>>> + >>>>>> + if (CONVERT_EXPR_P (c) >>>>>> + && TREE_CODE_CLASS (code) == tcc_comparison) >>>>>> >>>>>> I think you should simply use canonicalize_cond_expr_cond on the >>>>>> folding result. Or rather _not_ fold at all - we are taking the >>>>>> operands from the GIMPLE condition unmodified after all. >>>>>> >>>>>> - add_to_dst_predicate_list (loop, false_edge, >>>>>> - unshare_expr (cond), c2); >>>>>> + add_to_dst_predicate_list (loop, false_edge, unshare_expr >>>>>> (cond), >>>>>> + unshare_expr (c2)); >>>>>> >>>>>> why is it necessary to unshare c2? >>>>>> >>>>>> Please split out the PHI-with-multi-arg handling (I have not looked at >>>>>> that in detail). >>>>>> >>>>>> Thanks, >>>>>> Richard. >>>>>> >>>>>> >>>>>>> Changelog. >>>>>>> >>>>>>> 2014-10-13 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>> >>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone. >>>>>>> (flag_force_vectorize): New variable. >>>>>>> (edge_predicate): New function. >>>>>>> (set_edge_predicate): New function. >>>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>> for join blocks if it exists. >>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>> destination block of edge is not always executed. Set-up predicate >>>>>>> for critical edge. >>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>> (all_edges_are_critical): New function. >>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE. >>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>> algorithm is used. >>>>>>> (get_predicate_for_edge): New function. >>>>>>> (find_insertion_point): New function. >>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>> that extended predication must be applied). >>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>> predicates at the block begining for extended if-conversion. >>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning >>>>>>> for innermost loop marked with pragma omp simd and >>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges >>>>>>> for blocks with two successors. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>: >>>>>>>> Richard, >>>>>>>> >>>>>>>> here is reduced patch (part.1) which was reduced almost twice. >>>>>>>> Let's me also answer on your comments. >>>>>>>> >>>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges. >>>>>>>> My previous code was not correct and now it looks like: >>>>>>>> >>>>>>>> if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1) >>>>>>>> /* Edge E is not critical, use predicate of edge source bb. */ >>>>>>>> c = bb_predicate (b); >>>>>>>> else >>>>>>>> /* Edge E is critical and its aux field contains predicate. */ >>>>>>>> c = edge_predicate (e); >>>>>>>> >>>>>>>> 2. I completely delete all code related to creation of conditional >>>>>>>> expressions and completely rely on bool pattern recognition in >>>>>>>> vectorizer. But we need to delete all dead predicate computations >>>>>>>> which are not used since they prevent vectorization. I will add this >>>>>>>> local-dce function in next patch. >>>>>>>> 3. I also did not include in this patch recognition of general >>>>>>>> phi-nodes with two arguments only for which conversion of conditional >>>>>>>> scalar reduction can be applied also. >>>>>>>> Note that all these changes are applied for loop marked with pragma >>>>>>>> omp simd only. >>>>>>>> >>>>>>>> 2014-09-22 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>> >>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone. >>>>>>>> (flag_force_vectorize): New variable. >>>>>>>> (edge_predicate): New function. >>>>>>>> (set_edge_predicate): New function. >>>>>>>> (convert_name_to_cmp): New function. >>>>>>>> (add_to_predicate_list): Check unconditionally that bb is always >>>>>>>> executed to early exit. Use predicate of cd-equivalent block >>>>>>>> for join blocks if it exists. >>>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if >>>>>>>> destination block of edge is not always executed. Set-up predicate >>>>>>>> for critical edge. >>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>> if FLAG_FORCE_VECTORIZE was set-up. >>>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE. >>>>>>>> (if_convertible_stmt_p): Fix up pre-function comments. >>>>>>>> (all_edges_are_critical): New function. >>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical >>>>>>>> to reject block if-conversion with incoming critical edges only if >>>>>>>> FLAG_FORCE_VECTORIZE was not set-up. >>>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if >>>>>>>> fold_build2 produces bool conversion, recompute predicate using >>>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE. >>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's. >>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>> algorithm is used. >>>>>>>> (get_predicate_for_edge): New function. >>>>>>>> (find_insertion_point): New function. >>>>>>>> (predicate_arbitrary_scalar_phi): New function. >>>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE. >>>>>>>> Invoke find_insertion_point to initialize gsi and >>>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals >>>>>>>> that extended predication must be applied). >>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning >>>>>>>> for innermost loop marked with pragma omp simd and >>>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges >>>>>>>> for blocks with two successors. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: >>>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>> Richard! >>>>>>>>>> Here is updated patch with the following changes: >>>>>>>>>> >>>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended >>>>>>>>>> conversion. >>>>>>>>>> 2. Put predicate for critical edges to 'aux' field of edge, i.e. >>>>>>>>>> negate_predicate was deleted. >>>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can >>>>>>>>>> be critical. >>>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic >>>>>>>>>> blocks to simplify it. >>>>>>>>>> 5. I decided to not design pre-pass since it will lead generating >>>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi >>>>>>>>>> of kind >>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>> only one cond expression is required and this is considered as simple >>>>>>>>>> optimization for arbitrary phi-function. More precise, >>>>>>>>>> if phi-function have only two different arguments and one of them has >>>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2 >>>>>>>>>> arguments. >>>>>>>>>> For arbitrary phi function a chain of cond expressions is produced. >>>>>>>>>> >>>>>>>>>> Updated patch is attached. >>>>>>>>>> >>>>>>>>>> Any comments will be appreciated. >>>>>>>>> >>>>>>>>> The patch is still very big and does multiple things at once which >>>>>>>>> makes >>>>>>>>> it hard to review. >>>>>>>>> >>>>>>>>> In addition to that it changes function singatures without updating >>>>>>>>> the function comments. For example what is the convert_bool >>>>>>>>> argument doing to add_to_dst_predicate_list? Why do we need >>>>>>>>> all this added logic. >>>>>>>>> >>>>>>>>> You duplicate operand_equal_for_phi_arg_p. >>>>>>>>> >>>>>>>>> I think the code handling PHIs with more than two operands but >>>>>>>>> only two unequal operands is useful generally, so that's an obvious >>>>>>>>> candidate for splitting out into a separate patch. >>>>>>>>> >>>>>>>>> + CONVERT_BOOL argument was added to convert bool predicate >>>>>>>>> computations >>>>>>>>> + which is not supported by vectorizer to int type through creating >>>>>>>>> of >>>>>>>>> + conditional expressions. */ >>>>>>>>> >>>>>>>>> Example? The vectorizer has patterns for bool predicate computations. >>>>>>>>> This seems to be another feature that needs splitting out. >>>>>>>>> >>>>>>>>> The way you get around the critical edge parts looks awkward to me. >>>>>>>>> Please either do _all_ predicates as edge predicates or simply >>>>>>>>> split critical edges (of the respective loop body). >>>>>>>>> >>>>>>>>> I still think that an utility doing same PHI arg merging by >>>>>>>>> introducing >>>>>>>>> forwarder blocks would be nicer to have. >>>>>>>>> >>>>>>>>> I'd restructure the main tree_if_conversion function to apply these >>>>>>>>> CFG pre-transforms when we are going to version the loop >>>>>>>>> for if conversion (eventually transitioning to always doing that). >>>>>>>>> >>>>>>>>> So - please split up the patch. It's way too big. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> 2014-08-15 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>> >>>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function >>>>>>>>>> clone. >>>>>>>>>> (flag_force_vectorize): New variable. >>>>>>>>>> (edge_predicate): New function. >>>>>>>>>> (set_edge_predicate): New function. >>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field. >>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and >>>>>>>>>> such bb exists; save it in static variable for further possible use. >>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true. >>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>> Add early function exit if edge target block is always executed. >>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true. >>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>> Set-up predicate for crritical edge if convert_bool is true. >>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>> if flag_force_vectorize wa set-up. >>>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize. >>>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if >>>>>>>>>> flag_force_vectorize was set-up. >>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical >>>>>>>>>> to reject block if-conversion with imcoming critical edges only if >>>>>>>>>> flag_force_vectorize was not set-up. >>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform >>>>>>>>>> comparison expressions of boolean type into conditional expressions >>>>>>>>>> with integral operands. If convert_bool argument was set-up and >>>>>>>>>> vect bool pattern can be appied perform the following transformation: >>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>> Add check that if fold_build2 produces bool conversion if >>>>>>>>>> convert_bool >>>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument >>>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and >>>>>>>>>> add_to_predicate_list. >>>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if >>>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's. >>>>>>>>>> Call predicate_bbs with additional argument equal to false. >>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>> extended phi node predication. If number of predecessors of phi-block >>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>> algorithm is used. >>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals >>>>>>>>>> that >>>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args. >>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond >>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>> (predicate_arbitrary_phi): New function. >>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement >>>>>>>>>> iterator for predication of extended scalar phi's for insertion. >>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended >>>>>>>>>> predication to build mask. >>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning >>>>>>>>>> for innermost loop marked with pragma omp simd. >>>>>>>>>> >>>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener >>>>>>>>>> <richard.guent...@gmail.com>: >>>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev >>>>>>>>>>> <ysrum...@gmail.com> wrote: >>>>>>>>>>>> Hi All, >>>>>>>>>>>> >>>>>>>>>>>> We implemented additional support for pragma omp simd in part of >>>>>>>>>>>> extended if-conversion loops with such pragma. These extensions >>>>>>>>>>>> include: >>>>>>>>>>>> >>>>>>>>>>>> 1. All extensions are performed only if considered loop or its >>>>>>>>>>>> outer >>>>>>>>>>>> loop was marked with pragma omp simd (force_vectorize); For >>>>>>>>>>>> ordinary >>>>>>>>>>>> loops behavior was not changed. >>>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more >>>>>>>>>>>> than 2 >>>>>>>>>>>> predecessors. >>>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in >>>>>>>>>>>> current design: >>>>>>>>>>>> all phi nodes must be in non-predicated basic block to conform >>>>>>>>>>>> semantic of COND_EXPR which is used for transformation. >>>>>>>>>>> >>>>>>>>>>> How is that so? If the PHI is predicated then its result will be >>>>>>>>>>> used >>>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs. >>>>>>>>>>> >>>>>>>>>>> No? >>>>>>>>>>> >>>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 >>>>>>>>>>>> arguments >>>>>>>>>>>> with some limitations: >>>>>>>>>>>> - for phi nodes which have more than 2 arguments, but only two >>>>>>>>>>>> arguments are different and one of them has the only occurence, >>>>>>>>>>>> transformation to single COND_EXPR can be done. >>>>>>>>>>>> - if phi node has more different arguments and all edge >>>>>>>>>>>> predicates >>>>>>>>>>>> correspondent to phi-arguments are disjoint, a chain of >>>>>>>>>>>> COND_EXPR >>>>>>>>>>>> will be generated for it. In current design very simple check >>>>>>>>>>>> is used: >>>>>>>>>>>> check starting from end that two edges correspondent to neighbor >>>>>>>>>>>> arguments have common predecessor which is used for further check >>>>>>>>>>>> with next edge. >>>>>>>>>>>> These guarantee that phi predication will produce the correct >>>>>>>>>>>> result. >>>>>>>>>>> >>>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by >>>>>>>>>>> inserting forwarder blocks. Thus >>>>>>>>>>> >>>>>>>>>>> x = PHI <1(2), 1(3), 2(4)> >>>>>>>>>>> >>>>>>>>>>> becomes >>>>>>>>>>> >>>>>>>>>>> bb 5: <forwarder-from(2)-and(3)> >>>>>>>>>>> >>>>>>>>>>> x = PHI <1(5), 2(4)> >>>>>>>>>>> >>>>>>>>>>> and >>>>>>>>>>> >>>>>>>>>>> x = PHI <1(2), 2(3), 3(4)> >>>>>>>>>>> >>>>>>>>>>> becomes >>>>>>>>>>> >>>>>>>>>>> bb 5: >>>>>>>>>>> x' = PHI <1(2), 2(3)> >>>>>>>>>>> >>>>>>>>>>> b = PHI<x'(5), 3(4)> >>>>>>>>>>> >>>>>>>>>>> which means that 3) has to work. Note that we want this kind of >>>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of >>>>>>>>>>> copies we need to insert on edges. >>>>>>>>>>> >>>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass >>>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform. >>>>>>>>>>> And make 3) work properly if it doesn't already. >>>>>>>>>>> >>>>>>>>>>> It looks like you introduce a "negate predicate" to work around the >>>>>>>>>>> critical edge limitation? Please instead change if-conversion to >>>>>>>>>>> work with edge predicates (as opposed to BB predicates). >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Here is example of such extended predication (compile with >>>>>>>>>>>> -march=core-avx2): >>>>>>>>>>>> #pragma omp simd safelen(8) >>>>>>>>>>>> for (i=0; i<512; i++) >>>>>>>>>>>> { >>>>>>>>>>>> float t = a[i]; >>>>>>>>>>>> if (t > 0 & t < 1.0e+17f) >>>>>>>>>>>> if (c[i] != 0) >>>>>>>>>>>> res += 1; >>>>>>>>>>>> } >>>>>>>>>>>> <bb 4>: >>>>>>>>>>>> # res_15 = PHI <res_1(5), 0(3)> >>>>>>>>>>>> # i_16 = PHI <i_11(5), 0(3)> >>>>>>>>>>>> # ivtmp_17 = PHI <ivtmp_14(5), 512(3)> >>>>>>>>>>>> t_5 = a[i_16]; >>>>>>>>>>>> _6 = t_5 > 0.0; >>>>>>>>>>>> _7 = t_5 < 9.9999998430674944e+16; >>>>>>>>>>>> _8 = _7 & _6; >>>>>>>>>>>> _ifc__28 = (unsigned int) _8; >>>>>>>>>>>> _10 = &c[i_16]; >>>>>>>>>>>> _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0; >>>>>>>>>>>> _9 = MASK_LOAD (_10, 0B, _ifc__36); >>>>>>>>>>>> _ifc__29 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>>> _ifc__30 = (int) _ifc__29; >>>>>>>>>>>> _ifc__31 = _9 != 0 ? _ifc__30 : 0; >>>>>>>>>>>> _ifc__32 = _ifc__28 != 0 ? 1 : 0; >>>>>>>>>>>> _ifc__33 = (int) _ifc__32; >>>>>>>>>>>> _ifc__34 = _9 == 0 ? _ifc__33 : 0; >>>>>>>>>>>> _ifc__35 = _ifc__31 != 0 ? 1 : 0; >>>>>>>>>>>> res_1 = res_15 + _ifc__35; >>>>>>>>>>>> i_11 = i_16 + 1; >>>>>>>>>>>> ivtmp_14 = ivtmp_17 - 1; >>>>>>>>>>>> if (ivtmp_14 != 0) >>>>>>>>>>>> goto <bb 4>; >>>>>>>>>>>> >>>>>>>>>>>> Bootstrap and regression testing did not show any new failures. >>>>>>>>>>>> >>>>>>>>>>>> gcc/ChageLog >>>>>>>>>>>> >>>>>>>>>>>> 2014-06-25 Yuri Rumyantsev <ysrum...@gmail.com> >>>>>>>>>>>> >>>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable. >>>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field. >>>>>>>>>>>> (bb_negate_predicate): New function. >>>>>>>>>>>> (set_bb_negate_predicate): New function. >>>>>>>>>>>> (bb_copy_predicate): New function. >>>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >>>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field. >>>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >>>>>>>>>>>> (convert_name_to_cmp): New function. >>>>>>>>>>>> (get_type_for_cond): New function. >>>>>>>>>>>> (convert_bool_predicate): New function. >>>>>>>>>>>> (predicate_disjunction): New function. >>>>>>>>>>>> (predicate_conjunction): New function. >>>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument. >>>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true. >>>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument. >>>>>>>>>>>> Add early function exit if edge target block is always executed. >>>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true. >>>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list. >>>>>>>>>>>> (equal_phi_args): New function. >>>>>>>>>>>> (phi_has_two_different_args): New function. >>>>>>>>>>>> (phi_args_disjoint): New function. >>>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args >>>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are >>>>>>>>>>>> in non-predicated basic blocks. >>>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize. >>>>>>>>>>>> (all_edges_are_critical): New function. >>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if >>>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical >>>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if >>>>>>>>>>>> flag_force_vectorize was not setup. >>>>>>>>>>>> (walk_cond_tree): New function. >>>>>>>>>>>> (vect_bool_pattern_is_applicable): New function. >>>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to >>>>>>>>>>>> transform >>>>>>>>>>>> comparison expressions of boolean type into conditional expressions >>>>>>>>>>>> with integral operands. If bool_conv argument is false or both >>>>>>>>>>>> outgoing edges are not critical old algorithm of predicate >>>>>>>>>>>> assignments >>>>>>>>>>>> is used, otherwise the following code was added: check on >>>>>>>>>>>> applicable >>>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of >>>>>>>>>>>> (bool) x != 0 --> y = (int) x; x != 0; >>>>>>>>>>>> compute predicates for both outgoing edges one of which is critical >>>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates >>>>>>>>>>>> using >>>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in >>>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and >>>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical >>>>>>>>>>>> edge, >>>>>>>>>>>> but generated gimplified statements are stored in their destination >>>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to >>>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list. >>>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional >>>>>>>>>>>> argument >>>>>>>>>>>> equal to false. >>>>>>>>>>>> (find_phi_replacement_condition): Extend function interface: >>>>>>>>>>>> it returns NULL if given phi node must be handled by means of >>>>>>>>>>>> extended phi node predication. If number of predecessors of >>>>>>>>>>>> phi-block >>>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original >>>>>>>>>>>> algorithm is used. >>>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals >>>>>>>>>>>> that >>>>>>>>>>>> both phi arguments must be evaluated through >>>>>>>>>>>> phi_has_two_different_args. >>>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if >>>>>>>>>>>> cond >>>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of >>>>>>>>>>>> is_cond_scalar_reduction. >>>>>>>>>>>> (get_predicate_for_edge): New function. >>>>>>>>>>>> (find_insertion_point): New function. >>>>>>>>>>>> (predicate_phi_disjoint_args): New function. >>>>>>>>>>>> (predicate_extended_scalar_phi): New function. >>>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement >>>>>>>>>>>> iterator for predication of extended scalar phi's for insertion. >>>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic >>>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert >>>>>>>>>>>> predicates at the block begining for extended if-conversion. >>>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended >>>>>>>>>>>> predication to build mask. >>>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >>>>>>>>>>>> (split_crit_edge): New function. >>>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current >>>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke >>>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for >>>>>>>>>>>> innermost loop marked with pragma omp simd.
if-conv.patch2.final
Description: Binary data