Richard! Here is updated patch with the following changes: 1. Any restrictions on phi-function were eliminated for extended conversion. 2. Put predicate for critical edges to 'aux' field of edge, i.e. negate_predicate was deleted. 3. Deleted splitting of critical edges, i.e. both outgoing edges can be critical. 4. Use notion of cd-equivalence to set-up predicate for join basic blocks to simplify it. 5. I decided to not design pre-pass since it will lead generating chain of cond expressions for phi-node if conversion, whereas for phi of kind x = PHI <1(2), 1(3), 2(4)> only one cond expression is required and this is considered as simple optimization for arbitrary phi-function. More precise, if phi-function have only two different arguments and one of them has single occurrence, if- conversion is performed as if phi have only 2 arguments. For arbitrary phi function a chain of cond expressions is produced.
Updated patch is attached. Any comments will be appreciated. 2014-08-15 Yuri Rumyantsev <ysrum...@gmail.com> * tree-if-conv.c (cgraph.h): Add include file to detect function clone. (flag_force_vectorize): New variable. (edge_predicate): New function. (set_edge_predicate): New function. (add_stmt_to_bb_predicate_gimplified_stmts): New function. (init_bb_predicate): Add initialization of negate_predicate field. (reset_bb_predicate): Reset negate_predicate to NULL_TREE. (convert_name_to_cmp): New function. (get_type_for_cond): New function. (convert_bool_predicate): New function. (predicate_disjunction): New function. (predicate_conjunction): New function. (add_to_predicate_list): Add convert_bool argument. Use predicate of cd-equivalent block if convert_bool is true and such bb exists; save it in static variable for further possible use. Add call of predicate_disjunction if convert_bool argument is true. (add_to_dst_predicate_list): Add convert_bool argument. Add early function exit if edge target block is always executed. Add call of predicate_conjunction if convert_bool argument is true. Pass convert_bool argument for add_to_predicate_list. Set-up predicate for crritical edge if convert_bool is true. (equal_phi_args): New function. (phi_has_two_different_args): New function. (if_convertible_phi_p): Accept phi nodes with more than two args if flag_force_vectorize wa set-up. (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize. (if_convertible_stmt_p): Allow calls of function clones if flag_force_vectorize was set-up. (all_edges_are_critical): New function. (if_convertible_bb_p): Allow bb has more than two predecessors if flag_force_vectorize was set-up. Use call of all_edges_are_critical to reject block if-conversion with imcoming critical edges only if flag_force_vectorize was not set-up. (walk_cond_tree): New function. (vect_bool_pattern_is_applicable): New function. (predicate_bbs): Add convert_bool argument which is used to transform comparison expressions of boolean type into conditional expressions with integral operands. If convert_bool argument was set-up and vect bool pattern can be appied perform the following transformation: (bool) x != 0 --> y = (int) x; x != 0; Add check that if fold_build2 produces bool conversion if convert_bool was set-up, recompute predicate using build2_loc. Additional argument 'convert_bool" is passed to add_to_dst_predicate_list and add_to_predicate_list. (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if flag_force_vectorize was set-up to calculate cd equivalent bb's. Call predicate_bbs with additional argument equal to false. (find_phi_replacement_condition): Extend function interface: it returns NULL if given phi node must be handled by means of extended phi node predication. If number of predecessors of phi-block is equal 2 and atleast one incoming edge is not critical original algorithm is used. (is_cond_scalar_reduction): Add 'extended' argument which signals that phi arguments must be evaluated through phi_has_two_different_args. (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction. (get_predicate_for_edge): New function. (find_insertion_point): New function. (predicate_arbitrary_phi): New function. (predicate_extended_scalar_phi): New function. (predicate_all_scalar_phis): Add code to set-up gimple statement iterator for predication of extended scalar phi's for insertion. (insert_gimplified_predicates): Add test for non-predicated basic blocks that there are no gimplified statements to insert. Insert predicates at the block begining for extended if-conversion. (predicate_mem_writes): Invoke convert_name_to_cmp for extended predication to build mask. (combine_blocks): Pass flag_force_vectorize to predicate_bbs. (tree_if_conversion): Initialize flag_force_vectorize from current loop or outer loop (to support pragma omp declare).Do loop versioning for innermost loop marked with pragma omp simd. 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guent...@gmail.com>: > On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote: >> Hi All, >> >> We implemented additional support for pragma omp simd in part of >> extended if-conversion loops with such pragma. These extensions >> include: >> >> 1. All extensions are performed only if considered loop or its outer >> loop was marked with pragma omp simd (force_vectorize); For ordinary >> loops behavior was not changed. >> 2. Took off cfg restriction on basic block which can have more than 2 >> predecessors. >> 3. Put additional restriction on phi nodes which was missed in current >> design: >> all phi nodes must be in non-predicated basic block to conform >> semantic of COND_EXPR which is used for transformation. > > How is that so? If the PHI is predicated then its result will be used > in a PHI node again and thus we'd create a sequence of COND_EXPRs. > > No? > >> 4. Extend predication of phi nodes: phi may have more than 2 arguments >> with some limitations: >> - for phi nodes which have more than 2 arguments, but only two >> arguments are different and one of them has the only occurence, >> transformation to single COND_EXPR can be done. >> - if phi node has more different arguments and all edge predicates >> correspondent to phi-arguments are disjoint, a chain of COND_EXPR >> will be generated for it. In current design very simple check is used: >> check starting from end that two edges correspondent to neighbor >> arguments have common predecessor which is used for further check >> with next edge. >> These guarantee that phi predication will produce the correct result. > > Btw, you can think of these extensions as unfactoring a PHI node by > inserting forwarder blocks. Thus > > x = PHI <1(2), 1(3), 2(4)> > > becomes > > bb 5: <forwarder-from(2)-and(3)> > > x = PHI <1(5), 2(4)> > > and > > x = PHI <1(2), 2(3), 3(4)> > > becomes > > bb 5: > x' = PHI <1(2), 2(3)> > > b = PHI<x'(5), 3(4)> > > which means that 3) has to work. Note that we want this kind of > PHI transforms for out-of-SSA as well to reduce the number of > copies we need to insert on edges. > > Thus it would be nice if you implemented 4) in terms of a pre-pass > over the force_vect loops PHI nodes, applying that CFG transform. > And make 3) work properly if it doesn't already. > > It looks like you introduce a "negate predicate" to work around the > critical edge limitation? Please instead change if-conversion to > work with edge predicates (as opposed to BB predicates). > > Thanks, > Richard. > >> >> Here is example of such extended predication (compile with -march=core-avx2): >> #pragma omp simd safelen(8) >> for (i=0; i<512; i++) >> { >> float t = a[i]; >> if (t > 0 & t < 1.0e+17f) >> if (c[i] != 0) >> res += 1; >> } >> <bb 4>: >> # res_15 = PHI <res_1(5), 0(3)> >> # i_16 = PHI <i_11(5), 0(3)> >> # ivtmp_17 = PHI <ivtmp_14(5), 512(3)> >> t_5 = a[i_16]; >> _6 = t_5 > 0.0; >> _7 = t_5 < 9.9999998430674944e+16; >> _8 = _7 & _6; >> _ifc__28 = (unsigned int) _8; >> _10 = &c[i_16]; >> _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0; >> _9 = MASK_LOAD (_10, 0B, _ifc__36); >> _ifc__29 = _ifc__28 != 0 ? 1 : 0; >> _ifc__30 = (int) _ifc__29; >> _ifc__31 = _9 != 0 ? _ifc__30 : 0; >> _ifc__32 = _ifc__28 != 0 ? 1 : 0; >> _ifc__33 = (int) _ifc__32; >> _ifc__34 = _9 == 0 ? _ifc__33 : 0; >> _ifc__35 = _ifc__31 != 0 ? 1 : 0; >> res_1 = res_15 + _ifc__35; >> i_11 = i_16 + 1; >> ivtmp_14 = ivtmp_17 - 1; >> if (ivtmp_14 != 0) >> goto <bb 4>; >> >> Bootstrap and regression testing did not show any new failures. >> >> gcc/ChageLog >> >> 2014-06-25 Yuri Rumyantsev <ysrum...@gmail.com> >> >> * tree-if-conv.c (flag_force_vectorize): New variable. >> (struct bb_predicate_s): Add negate_predicate field. >> (bb_negate_predicate): New function. >> (set_bb_negate_predicate): New function. >> (bb_copy_predicate): New function. >> (add_stmt_to_bb_predicate_gimplified_stmts): New function. >> (init_bb_predicate): Add initialization of negate_predicate field. >> (reset_bb_predicate): Reset negate_predicate to NULL_TREE. >> (convert_name_to_cmp): New function. >> (get_type_for_cond): New function. >> (convert_bool_predicate): New function. >> (predicate_disjunction): New function. >> (predicate_conjunction): New function. >> (add_to_predicate_list): Add convert_bool argument. >> Add call of predicate_disjunction if convert_bool argument is true. >> (add_to_dst_predicate_list): Add convert_bool argument. >> Add early function exit if edge target block is always executed. >> Add call of predicate_conjunction if convert_bool argument is true. >> Pass convert_bool argument for add_to_predicate_list. >> (equal_phi_args): New function. >> (phi_has_two_different_args): New function. >> (phi_args_disjoint): New function. >> (if_convertible_phi_p): Accept phi nodes with more than two args >> for loops marked with pragma omp simd. Add check that phi nodes are >> in non-predicated basic blocks. >> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize. >> (all_edges_are_critical): New function. >> (if_convertible_bb_p): Allow bb has more than two predecessors if >> flag_force_vectorize was setup. Use call of all_edges_are_critical >> to reject block if-conversion with imcoming critical edges only if >> flag_force_vectorize was not setup. >> (walk_cond_tree): New function. >> (vect_bool_pattern_is_applicable): New function. >> (predicate_bbs): Add convert_bool argument that is used to transform >> comparison expressions of boolean type into conditional expressions >> with integral operands. If bool_conv argument is false or both >> outgoing edges are not critical old algorithm of predicate assignments >> is used, otherwise the following code was added: check on applicable >> of vect-bool-pattern recognition and trnasformation of >> (bool) x != 0 --> y = (int) x; x != 0; >> compute predicates for both outgoing edges one of which is critical >> one using 'normal' edge, i.e. compute true and false predicates using >> normal outgoing edge only; evaluated predicates are stored in >> predicate and negate_predicate fields of struct bb_predicate_s and >> negate_predicate of normal edge conatins predicate of critical edge, >> but generated gimplified statements are stored in their destination >> block fields. Additional argument 'convert_bool" is passed to >> add_to_dst_predicate_list and add_to_predicate_list. >> (if_convertible_loop_p_1): Call predicate_bbs with additional argument >> equal to false. >> (find_phi_replacement_condition): Extend function interface: >> it returns NULL if given phi node must be handled by means of >> extended phi node predication. If number of predecessors of phi-block >> is equal 2 and atleast one incoming edge is not critical original >> algorithm is used. >> (is_cond_scalar_reduction): Add 'extended' argument which signals that >> both phi arguments must be evaluated through phi_has_two_different_args. >> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond >> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction. >> (get_predicate_for_edge): New function. >> (find_insertion_point): New function. >> (predicate_phi_disjoint_args): New function. >> (predicate_extended_scalar_phi): New function. >> (predicate_all_scalar_phis): Add code to set-up gimple statement >> iterator for predication of extended scalar phi's for insertion. >> (insert_gimplified_predicates): Add test for non-predicated basic >> blocks that there are no gimplified statements to insert. Insert >> predicates at the block begining for extended if-conversion. >> (predicate_mem_writes): Invoke convert_name_to_cmp for extended >> predication to build mask. >> (combine_blocks): Pass flag_force_vectorize to predicate_bbs. >> (split_crit_edge): New function. >> (tree_if_conversion): Initialize flag_force_vectorize from current >> loop or outer loop (to support pragma omp declare). Invoke >> split_crit_edge for extended predication. Do loop versioning for >> innermost loop marked with pragma omp simd.
patch.1
Description: Binary data