On Fri, Oct 17, 2014 at 4:09 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Richard,
>
> I reworked the patch as you proposed, but I didn't understand what
> did you mean by:
>
>>So please rework the patch so critical edges are always handled
>>correctly.
>
> In current patch flag_force_vectorize is used (1) to reject phi nodes
> with more than 2 arguments; (2) to reject basic blocks with only
> critical incoming edges since support for extended predication of phi
> nodes will be in next patch.

I mean that (2) should not be rejected dependent on flag_force_vectorize.
It was rejected because if-cvt couldn't handle it correctly before but with
this patch this is fixed.  I see no reason to still reject this then even
for !flag_force_vectorize.

Rejecting PHIs with more than two arguments with flag_force_vectorize
is ok.

Richard.

> Could you please clarify your statement.
>
> I attached modified patch.
>
> ChangeLog:
>
> 2014-10-17  Yuri Rumyantsev  <ysrum...@gmail.com>
>
> (flag_force_vectorize): New variable.
> (edge_predicate): New function.
> (set_edge_predicate): New function.
> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
> if destination block of edge is not always executed. Set-up predicate
> for critical edge.
> (if_convertible_phi_p): Accept phi nodes with more than two args
> if FLAG_FORCE_VECTORIZE was set-up.
> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
> (if_convertible_stmt_p): Fix up pre-function comments.
> (all_edges_are_critical): New function.
> (if_convertible_bb_p): Use call of all_preds_critical_p
> to reject block if-conversion with incoming critical edges only if
> FLAG_FORCE_VECTORIZE was not set-up.
> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
> to compute predicate instead of fold_build2_loc.
> Add zeroing of edge 'aux' field.
> (find_phi_replacement_condition): Extend function interface:
> it returns NULL if given phi node must be handled by means of
> extended phi node predication. If number of predecessors of phi-block
> is equal 2 and atleast one incoming edge is not critical original
> algorithm is used.
> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
> Nullify 'aux' field of edges for blocks with two successors.
>
>
>
>
> 2014-10-17 13:09 GMT+04:00 Richard Biener <richard.guent...@gmail.com>:
>> On Thu, Oct 16, 2014 at 5:42 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>> Richard,
>>>
>>> Here is reduced patch as you requested. All your remarks have been fixed.
>>> Could you please look at it ( I have already sent the patch with
>>> changes in add_to_predicate_list for review).
>>
>> +             if (dump_file && (dump_flags & TDF_DETAILS))
>> +               fprintf (dump_file, "More than two phi node args.\n");
>> +             return false;
>> +           }
>> +
>> +        }
>>
>> Excess vertical space.
>>
>>
>> +/* Assumes that BB has more than 2 predecessors.
>>
>> More than 1 predecessor?
>>
>> +   Returns false if at least one successor is not on critical edge
>> +   and true otherwise.  */
>> +
>> +static inline bool
>> +all_edges_are_critical (basic_block bb)
>> +{
>>
>> "all_preds_critical_p" would be a better name
>>
>> +  if (EDGE_COUNT (bb->preds) > 2)
>> +    {
>> +      if (!flag_force_vectorize)
>> +       return false;
>> +    }
>>
>> as I said in the last review I don't think we should restrict edge
>> predicates to flag_force_vectorize.  At least I can't see how
>> if-conversion is magically more expensive for that case?
>>
>> So please rework the patch so critical edges are always handled
>> correctly.
>>
>> Ok with that and the above suggested changes.
>>
>> Thanks,
>> Richard.
>>
>>
>>> Thanks.
>>> Yuri.
>>> ChangeLog
>>> 2014-10-16  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>
>>> (flag_force_vectorize): New variable.
>>> (edge_predicate): New function.
>>> (set_edge_predicate): New function.
>>> (add_to_dst_predicate_list): Conditionally invoke add_to_predicate_list
>>> if destination block of edge is not always executed. Set-up predicate
>>> for critical edge.
>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>> if FLAG_FORCE_VECTORIZE was set-up.
>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>> (all_edges_are_critical): New function.
>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>> to reject block if-conversion with incoming critical edges only if
>>> FLAG_FORCE_VECTORIZE was not set-up.
>>> (predicate_bbs): Skip loop exit block also.Invoke build2_loc
>>> to compute predicate instead of fold_build2_loc.
>>> Add zeroing of edge 'aux' field.
>>> (find_phi_replacement_condition): Extend function interface:
>>> it returns NULL if given phi node must be handled by means of
>>> extended phi node predication. If number of predecessors of phi-block
>>> is equal 2 and atleast one incoming edge is not critical original
>>> algorithm is used.
>>> (tree_if_conversion): Temporary set-up FLAG_FORCE_VECTORIZE to false.
>>> Nullify 'aux' field of edges for blocks with two successors.
>>>
>>>
>>>
>>> 2014-10-15 13:50 GMT+04:00 Richard Biener <richard.guent...@gmail.com>:
>>>> On Mon, Oct 13, 2014 at 11:38 AM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>> wrote:
>>>>> Richard,
>>>>>
>>>>> Here is updated patch (part1) for extended if conversion.
>>>>>
>>>>> Second part of patch will be sent later.
>>>>
>>>> Ok, I'm starting to look at this.  I'd still like you to split things up
>>>> more.
>>>>
>>>>  static inline void
>>>>  add_to_predicate_list (struct loop *loop, basic_block bb, tree nc)
>>>>  {
>>>> ...
>>>>
>>>> +      /* We use notion of cd equivalence to get simplier predicate for
>>>> +        join block, e.g. if join block has 2 predecessors with predicates
>>>> +        p1 & p2 and p1 & !p2, we'd like to get p1 for it instead of
>>>> +        p1 & p2 | p1 & !p2.  */
>>>> +      if (dom_bb != loop->header
>>>> +         && get_immediate_dominator (CDI_POST_DOMINATORS, dom_bb) == bb)
>>>> +       {
>>>> +         gcc_assert (flow_bb_inside_loop_p (loop, dom_bb));
>>>> +         bc = bb_predicate (dom_bb);
>>>> +         gcc_assert (!is_true_predicate (bc));
>>>>
>>>> these changes look worthwhile even for !flag_force_vectorize.  So please
>>>> split the change to add_to_predicate_list out and compute post-dominators
>>>> unconditionally.  Note that you should call free_dominance_info
>>>> (CDI_POST_DOMINATORS) at the end of if-conversion.
>>>>
>>>> +  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, e->dest))
>>>> +    add_to_predicate_list (loop, e->dest, cond);
>>>> +
>>>> +  /* If edge E is critical save predicate on it.  */
>>>> +  if (EDGE_COUNT (e->dest->preds) >= 2)
>>>> +    set_edge_predicate (e, cond);
>>>>
>>>> how do we know the edge is critical by this simple check?  Why not
>>>> simply always save edge predicates (well, you kind of do but omit
>>>> the case where e->src dominates e->dest).
>>>>
>>>> Btw, you can rely on edge->aux being NULL at the start of the
>>>> pass but need to clear it at the end (best use clear_aux_for_edges ()
>>>> for that).  So stuff like
>>>>
>>>> +         extract_true_false_edges_from_block (bb, &true_edge, 
>>>> &false_edge);
>>>> +         if (flag_force_vectorize)
>>>> +           true_edge->aux = false_edge->aux = NULL;
>>>>
>>>> shouldn't be necessary.
>>>>
>>>> I think the edge predicate handling should also be unconditionally
>>>> and not depend on flag_force_vectorize.
>>>>
>>>> +      /* The loop latch and loop exit block are always executed and
>>>> +        have no extra conditions to be processed: skip them.  */
>>>> +      if (bb == loop->latch
>>>> +         || bb_with_exit_edge_p (loop, bb))
>>>>
>>>> I don't think the edge stuff is true - given you still only reset the
>>>> loop->latch bb predicate the change looks broken.
>>>>
>>>> +         /* Fold_build2 can produce bool conversion which is not
>>>> +             supported by vectorizer, so re-build it without folding.
>>>> +            For example, such conversion is generated for sequence:
>>>> +               _Bool _7, _8, _9;
>>>> +               _7 = _6 != 13; _8 = _6 != 0; _9 = _8 & _9;
>>>> +               if (_9 != 0)  --> (bool)_9.  */
>>>> +
>>>> +         if (CONVERT_EXPR_P (c)
>>>> +             && TREE_CODE_CLASS (code) == tcc_comparison)
>>>>
>>>> I think you should simply use canonicalize_cond_expr_cond on the
>>>> folding result.  Or rather _not_ fold at all - we are taking the
>>>> operands from the GIMPLE condition unmodified after all.
>>>>
>>>> -         add_to_dst_predicate_list (loop, false_edge,
>>>> -                                    unshare_expr (cond), c2);
>>>> +         add_to_dst_predicate_list (loop, false_edge, unshare_expr (cond),
>>>> +                                    unshare_expr (c2));
>>>>
>>>> why is it necessary to unshare c2?
>>>>
>>>> Please split out the PHI-with-multi-arg handling  (I have not looked at
>>>> that in detail).
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>
>>>>> Changelog.
>>>>>
>>>>> 2014-10-13  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>
>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>> (flag_force_vectorize): New variable.
>>>>> (edge_predicate): New function.
>>>>> (set_edge_predicate): New function.
>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>> for join blocks if it exists.
>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>> destination block of edge is not always executed. Set-up predicate
>>>>> for critical edge.
>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>> (all_edges_are_critical): New function.
>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>> to reject block if-conversion with incoming critical edges only if
>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>> it returns NULL if given phi node must be handled by means of
>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>> algorithm is used.
>>>>> (get_predicate_for_edge): New function.
>>>>> (find_insertion_point): New function.
>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>> Invoke find_insertion_point to initialize gsi and
>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>> that extended predication must be applied).
>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>> predicates at the block begining for extended if-conversion.
>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>> for innermost loop marked with pragma omp simd and
>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>> for blocks with two successors.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> 2014-09-22 12:28 GMT+04:00 Yuri Rumyantsev <ysrum...@gmail.com>:
>>>>>> Richard,
>>>>>>
>>>>>> here is reduced patch (part.1) which was reduced almost twice.
>>>>>> Let's me also answer on your comments.
>>>>>>
>>>>>> 1. I really use edge field 'aux' to keep predicate for critical edges.
>>>>>> My previous code was not correct and now it looks like:
>>>>>>
>>>>>>   if (EDGE_COUNT (b->succs) == 1 || EDGE_COUNT (e->dest->preds) == 1)
>>>>>>     /* Edge E is not critical,  use predicate of edge source bb. */
>>>>>>     c = bb_predicate (b);
>>>>>>   else
>>>>>>     /* Edge E is critical and its aux field contains predicate.  */
>>>>>>     c = edge_predicate (e);
>>>>>>
>>>>>> 2. I completely delete all code related to creation of conditional
>>>>>> expressions and completely rely on bool pattern recognition in
>>>>>> vectorizer. But we need to delete all dead predicate computations
>>>>>> which are not used since they prevent vectorization. I will add this
>>>>>> local-dce function in next patch.
>>>>>> 3. I also did not include in this patch recognition of general
>>>>>> phi-nodes with two arguments only for which conversion of conditional
>>>>>> scalar reduction can be applied also.
>>>>>> Note that all these changes are applied for loop marked with pragma
>>>>>> omp simd only.
>>>>>>
>>>>>> 2014-09-22  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>
>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>> (flag_force_vectorize): New variable.
>>>>>> (edge_predicate): New function.
>>>>>> (set_edge_predicate): New function.
>>>>>> (convert_name_to_cmp): New function.
>>>>>> (add_to_predicate_list): Check unconditionally that bb is always
>>>>>> executed to early exit. Use predicate of cd-equivalent block
>>>>>> for join blocks if it exists.
>>>>>> (add_to_dst_predicate_list): Invoke add_to_predicate_list if
>>>>>> destination block of edge is not always executed. Set-up predicate
>>>>>> for critical edge.
>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>> if FLAG_FORCE_VECTORIZE was set-up.
>>>>>> (ifcvt_can_use_mask_load_store): Use FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_stmt_p): Fix up pre-function comments.
>>>>>> (all_edges_are_critical): New function.
>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>> FLAG_FORCE_VECTORIZE was set-up. Use call of all_edges_are_critical
>>>>>> to reject block if-conversion with incoming critical edges only if
>>>>>> FLAG_FORCE_VECTORIZE was not set-up.
>>>>>> (predicate_bbs): Skip loop exit block also. Add check that if
>>>>>> fold_build2 produces bool conversion, recompute predicate using
>>>>>> build2_loc. Add zeroing of edge 'aux' field under FLAG_FORCE_VECTORIZE.
>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>> FLAG_FORCE_VECTORIZE was set-up to calculate cd equivalent bb's.
>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>> algorithm is used.
>>>>>> (get_predicate_for_edge): New function.
>>>>>> (find_insertion_point): New function.
>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>> (predicate_all_scalar_phis): Introduce new variable BEFORE.
>>>>>> Invoke find_insertion_point to initialize gsi and
>>>>>> predicate_arbitrary_scalar_phi if TRUE_BB is NULL - it signals
>>>>>> that extended predication must be applied).
>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>> predicates at the block begining for extended if-conversion.
>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>> for innermost loop marked with pragma omp simd and
>>>>>> FLAG_TREE_LOOP_IF_CONVERT was not sett-up. Nullify 'aux' field of edges
>>>>>> for blocks with two successors.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-09-08 17:10 GMT+04:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>> On Fri, Aug 15, 2014 at 2:02 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>> wrote:
>>>>>>>> Richard!
>>>>>>>> Here is updated patch with the following changes:
>>>>>>>>
>>>>>>>> 1. Any restrictions on phi-function were eliminated for extended 
>>>>>>>> conversion.
>>>>>>>> 2.  Put predicate for critical edges to 'aux' field of edge, i.e.
>>>>>>>> negate_predicate was deleted.
>>>>>>>> 3. Deleted splitting of critical edges, i.e. both outgoing edges can
>>>>>>>> be critical.
>>>>>>>> 4. Use notion of cd-equivalence to set-up predicate for join basic
>>>>>>>> blocks to simplify it.
>>>>>>>> 5. I decided to not design pre-pass since it will lead generating
>>>>>>>> chain of cond expressions for phi-node if conversion, whereas for phi
>>>>>>>> of kind
>>>>>>>>   x = PHI <1(2), 1(3), 2(4)>
>>>>>>>> only one cond expression is required and this is considered as simple
>>>>>>>> optimization for arbitrary phi-function. More precise,
>>>>>>>> if phi-function have only two different arguments and one of them has
>>>>>>>> single occurrence, if- conversion is performed as if phi have only 2
>>>>>>>> arguments.
>>>>>>>> For arbitrary phi function a chain of cond expressions is produced.
>>>>>>>>
>>>>>>>> Updated patch is attached.
>>>>>>>>
>>>>>>>> Any comments will be appreciated.
>>>>>>>
>>>>>>> The patch is still very big and does multiple things at once which makes
>>>>>>> it hard to review.
>>>>>>>
>>>>>>> In addition to that it changes function singatures without updating
>>>>>>> the function comments.  For example what is the convert_bool
>>>>>>> argument doing to add_to_dst_predicate_list?  Why do we need
>>>>>>> all this added logic.
>>>>>>>
>>>>>>> You duplicate operand_equal_for_phi_arg_p.
>>>>>>>
>>>>>>> I think the code handling PHIs with more than two operands but
>>>>>>> only two unequal operands is useful generally, so that's an obvious
>>>>>>> candidate for splitting out into a separate patch.
>>>>>>>
>>>>>>> +   CONVERT_BOOL argument was added to convert bool predicate 
>>>>>>> computations
>>>>>>> +   which is not supported by vectorizer to int type through creating of
>>>>>>> +   conditional expressions.  */
>>>>>>>
>>>>>>> Example?  The vectorizer has patterns for bool predicate computations.
>>>>>>> This seems to be another feature that needs splitting out.
>>>>>>>
>>>>>>> The way you get around the critical edge parts looks awkward to me.
>>>>>>> Please either do _all_ predicates as edge predicates or simply
>>>>>>> split critical edges (of the respective loop body).
>>>>>>>
>>>>>>> I still think that an utility doing same PHI arg merging by introducing
>>>>>>> forwarder blocks would be nicer to have.
>>>>>>>
>>>>>>> I'd restructure the main tree_if_conversion function to apply these
>>>>>>> CFG pre-transforms when we are going to version the loop
>>>>>>> for if conversion (eventually transitioning to always doing that).
>>>>>>>
>>>>>>> So - please split up the patch.  It's way too big.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Richard.
>>>>>>>
>>>>>>>> 2014-08-15  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>
>>>>>>>> * tree-if-conv.c (cgraph.h): Add include file to detect function clone.
>>>>>>>> (flag_force_vectorize): New variable.
>>>>>>>> (edge_predicate): New function.
>>>>>>>> (set_edge_predicate): New function.
>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>> (get_type_for_cond): New function.
>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>> (predicate_disjunction): New function.
>>>>>>>> (predicate_conjunction): New function.
>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>> Use predicate of cd-equivalent block if convert_bool is true and
>>>>>>>> such bb exists; save it in static variable for further possible use.
>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>> Set-up predicate for crritical edge if convert_bool is true.
>>>>>>>> (equal_phi_args): New function.
>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>> if flag_force_vectorize wa set-up.
>>>>>>>> (ifcvt_can_use_mask_load_store): Add test on flag_force_vectorize.
>>>>>>>> (if_convertible_stmt_p): Allow calls of function clones if
>>>>>>>> flag_force_vectorize was set-up.
>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>> flag_force_vectorize was set-up. Use call of all_edges_are_critical
>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>> flag_force_vectorize was not set-up.
>>>>>>>> (walk_cond_tree): New function.
>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>> (predicate_bbs): Add convert_bool argument which is used to transform
>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>> with integral operands. If convert_bool argument was set-up and
>>>>>>>> vect bool pattern can be appied perform the following transformation:
>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>> Add check that if fold_build2 produces bool conversion if convert_bool
>>>>>>>> was set-up, recompute predicate using build2_loc. Additional argument
>>>>>>>> 'convert_bool" is passed to add_to_dst_predicate_list and
>>>>>>>> add_to_predicate_list.
>>>>>>>> (if_convertible_loop_p_1): Recompute POST_DOMINATOR tree if
>>>>>>>> flag_force_vectorize was set-up to calculate cd equivalent bb's.
>>>>>>>> Call predicate_bbs with additional argument equal to false.
>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>> algorithm is used.
>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals that
>>>>>>>> phi arguments must be evaluated through phi_has_two_different_args.
>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>> is SSA_NAME. Add 'false' argument to call of is_cond_scalar_reduction.
>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>> (find_insertion_point): New function.
>>>>>>>> (predicate_arbitrary_phi): New function.
>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>> predication to build mask.
>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>> loop or outer loop (to support pragma omp declare).Do loop versioning
>>>>>>>> for innermost loop marked with pragma omp simd.
>>>>>>>>
>>>>>>>> 2014-08-01 13:40 GMT+04:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>>> On Wed, Jun 25, 2014 at 4:06 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>>> wrote:
>>>>>>>>>> Hi All,
>>>>>>>>>>
>>>>>>>>>> We implemented additional support for pragma omp simd in part of
>>>>>>>>>> extended if-conversion loops with such pragma. These extensions
>>>>>>>>>> include:
>>>>>>>>>>
>>>>>>>>>> 1. All extensions are performed only if considered loop or its outer
>>>>>>>>>>    loop was marked with pragma omp simd (force_vectorize); For 
>>>>>>>>>> ordinary
>>>>>>>>>>    loops behavior was not changed.
>>>>>>>>>> 2. Took off cfg restriction on basic block which can have more than 2
>>>>>>>>>>    predecessors.
>>>>>>>>>> 3. Put additional restriction on phi nodes which was missed in 
>>>>>>>>>> current design:
>>>>>>>>>>    all phi nodes must be in non-predicated basic block to conform
>>>>>>>>>>    semantic of COND_EXPR which is used for transformation.
>>>>>>>>>
>>>>>>>>> How is that so?  If the PHI is predicated then its result will be used
>>>>>>>>> in a PHI node again and thus we'd create a sequence of COND_EXPRs.
>>>>>>>>>
>>>>>>>>> No?
>>>>>>>>>
>>>>>>>>>> 4. Extend predication of phi nodes: phi may have more than 2 
>>>>>>>>>> arguments
>>>>>>>>>> with some limitations:
>>>>>>>>>>    - for phi nodes which have more than 2 arguments, but only two
>>>>>>>>>>    arguments are different and one of them has the only occurence,
>>>>>>>>>> transformation to  single COND_EXPR can be done.
>>>>>>>>>>    - if phi node has more different arguments and all edge predicates
>>>>>>>>>>    correspondent to phi-arguments are disjoint, a chain of COND_EXPR
>>>>>>>>>>    will be generated for it. In current design very simple check is 
>>>>>>>>>> used:
>>>>>>>>>>    check starting from end that two edges correspondent to neighbor
>>>>>>>>>> arguments have common predecessor which is used for further check
>>>>>>>>>> with next edge.
>>>>>>>>>>  These guarantee that phi predication will produce the correct 
>>>>>>>>>> result.
>>>>>>>>>
>>>>>>>>> Btw, you can think of these extensions as unfactoring a PHI node by
>>>>>>>>> inserting forwarder blocks.  Thus
>>>>>>>>>
>>>>>>>>>    x = PHI <1(2), 1(3), 2(4)>
>>>>>>>>>
>>>>>>>>> becomes
>>>>>>>>>
>>>>>>>>>   bb 5: <forwarder-from(2)-and(3)>
>>>>>>>>>
>>>>>>>>>   x = PHI <1(5), 2(4)>
>>>>>>>>>
>>>>>>>>> and
>>>>>>>>>
>>>>>>>>>   x = PHI <1(2), 2(3), 3(4)>
>>>>>>>>>
>>>>>>>>> becomes
>>>>>>>>>
>>>>>>>>>   bb 5:
>>>>>>>>>   x' = PHI <1(2), 2(3)>
>>>>>>>>>
>>>>>>>>>   b = PHI<x'(5), 3(4)>
>>>>>>>>>
>>>>>>>>> which means that 3) has to work.  Note that we want this kind of
>>>>>>>>> PHI transforms for out-of-SSA as well to reduce the number of
>>>>>>>>> copies we need to insert on edges.
>>>>>>>>>
>>>>>>>>> Thus it would be nice if you implemented 4) in terms of a pre-pass
>>>>>>>>> over the force_vect loops PHI nodes, applying that CFG transform.
>>>>>>>>> And make 3) work properly if it doesn't already.
>>>>>>>>>
>>>>>>>>> It looks like you introduce a "negate predicate" to work around the
>>>>>>>>> critical edge limitation?  Please instead change if-conversion to
>>>>>>>>> work with edge predicates (as opposed to BB predicates).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Richard.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is example of such extended predication (compile with 
>>>>>>>>>> -march=core-avx2):
>>>>>>>>>> #pragma omp simd safelen(8)
>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>   {
>>>>>>>>>>     float t = a[i];
>>>>>>>>>>     if (t > 0 & t < 1.0e+17f)
>>>>>>>>>>       if (c[i] != 0)
>>>>>>>>>> res += 1;
>>>>>>>>>>   }
>>>>>>>>>>   <bb 4>:
>>>>>>>>>>   # res_15 = PHI <res_1(5), 0(3)>
>>>>>>>>>>   # i_16 = PHI <i_11(5), 0(3)>
>>>>>>>>>>   # ivtmp_17 = PHI <ivtmp_14(5), 512(3)>
>>>>>>>>>>   t_5 = a[i_16];
>>>>>>>>>>   _6 = t_5 > 0.0;
>>>>>>>>>>   _7 = t_5 < 9.9999998430674944e+16;
>>>>>>>>>>   _8 = _7 & _6;
>>>>>>>>>>   _ifc__28 = (unsigned int) _8;
>>>>>>>>>>   _10 = &c[i_16];
>>>>>>>>>>   _ifc__36 = _ifc__28 != 0 ? 4294967295 : 0;
>>>>>>>>>>   _9 = MASK_LOAD (_10, 0B, _ifc__36);
>>>>>>>>>>   _ifc__29 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>   _ifc__30 = (int) _ifc__29;
>>>>>>>>>>   _ifc__31 = _9 != 0 ? _ifc__30 : 0;
>>>>>>>>>>   _ifc__32 = _ifc__28 != 0 ? 1 : 0;
>>>>>>>>>>   _ifc__33 = (int) _ifc__32;
>>>>>>>>>>   _ifc__34 = _9 == 0 ? _ifc__33 : 0;
>>>>>>>>>>   _ifc__35 = _ifc__31 != 0 ? 1 : 0;
>>>>>>>>>>   res_1 = res_15 + _ifc__35;
>>>>>>>>>>   i_11 = i_16 + 1;
>>>>>>>>>>   ivtmp_14 = ivtmp_17 - 1;
>>>>>>>>>>   if (ivtmp_14 != 0)
>>>>>>>>>>     goto <bb 4>;
>>>>>>>>>>
>>>>>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>>>>>>
>>>>>>>>>> gcc/ChageLog
>>>>>>>>>>
>>>>>>>>>> 2014-06-25  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>
>>>>>>>>>> * tree-if-conv.c (flag_force_vectorize): New variable.
>>>>>>>>>> (struct bb_predicate_s): Add negate_predicate field.
>>>>>>>>>> (bb_negate_predicate): New function.
>>>>>>>>>> (set_bb_negate_predicate): New function.
>>>>>>>>>> (bb_copy_predicate): New function.
>>>>>>>>>> (add_stmt_to_bb_predicate_gimplified_stmts): New function.
>>>>>>>>>> (init_bb_predicate): Add initialization of negate_predicate field.
>>>>>>>>>> (reset_bb_predicate): Reset negate_predicate to NULL_TREE.
>>>>>>>>>> (convert_name_to_cmp): New function.
>>>>>>>>>> (get_type_for_cond): New function.
>>>>>>>>>> (convert_bool_predicate): New function.
>>>>>>>>>> (predicate_disjunction): New function.
>>>>>>>>>> (predicate_conjunction): New function.
>>>>>>>>>> (add_to_predicate_list): Add convert_bool argument.
>>>>>>>>>> Add call of predicate_disjunction if convert_bool argument is true.
>>>>>>>>>> (add_to_dst_predicate_list): Add convert_bool argument.
>>>>>>>>>> Add early function exit if edge target block is always executed.
>>>>>>>>>> Add call of predicate_conjunction if convert_bool argument is true.
>>>>>>>>>> Pass convert_bool argument for add_to_predicate_list.
>>>>>>>>>> (equal_phi_args): New function.
>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>> (phi_args_disjoint): New function.
>>>>>>>>>> (if_convertible_phi_p): Accept phi nodes with more than two args
>>>>>>>>>> for loops marked with pragma omp simd. Add check that phi nodes are
>>>>>>>>>> in non-predicated basic blocks.
>>>>>>>>>> (ifcvt_can_use_mask_load_store): Use flag_force_vectorize.
>>>>>>>>>> (all_edges_are_critical): New function.
>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than two predecessors if
>>>>>>>>>> flag_force_vectorize was setup. Use call of all_edges_are_critical
>>>>>>>>>> to reject block if-conversion with imcoming critical edges only if
>>>>>>>>>> flag_force_vectorize was not setup.
>>>>>>>>>> (walk_cond_tree): New function.
>>>>>>>>>> (vect_bool_pattern_is_applicable): New function.
>>>>>>>>>> (predicate_bbs): Add convert_bool argument that is used to transform
>>>>>>>>>> comparison expressions of boolean type into conditional expressions
>>>>>>>>>> with integral operands. If bool_conv argument is false or both
>>>>>>>>>> outgoing edges are not critical old algorithm of predicate 
>>>>>>>>>> assignments
>>>>>>>>>> is used, otherwise the following code was added: check on applicable
>>>>>>>>>> of vect-bool-pattern recognition and trnasformation of
>>>>>>>>>> (bool) x != 0  --> y = (int) x; x != 0;
>>>>>>>>>> compute predicates for both outgoing edges one of which is critical
>>>>>>>>>> one using 'normal' edge, i.e. compute true and false predicates using
>>>>>>>>>> normal outgoing edge only; evaluated predicates are stored in
>>>>>>>>>> predicate and negate_predicate fields of struct bb_predicate_s and
>>>>>>>>>> negate_predicate of normal edge conatins predicate of critical edge,
>>>>>>>>>> but generated gimplified statements are stored in their destination
>>>>>>>>>> block fields. Additional argument 'convert_bool" is passed to
>>>>>>>>>> add_to_dst_predicate_list and add_to_predicate_list.
>>>>>>>>>> (if_convertible_loop_p_1): Call predicate_bbs with additional 
>>>>>>>>>> argument
>>>>>>>>>> equal to false.
>>>>>>>>>> (find_phi_replacement_condition): Extend function interface:
>>>>>>>>>> it returns NULL if given phi node must be handled by means of
>>>>>>>>>> extended phi node predication. If number of predecessors of phi-block
>>>>>>>>>> is equal 2 and atleast one incoming edge is not critical original
>>>>>>>>>> algorithm is used.
>>>>>>>>>> (is_cond_scalar_reduction): Add 'extended' argument which signals 
>>>>>>>>>> that
>>>>>>>>>> both phi arguments must be evaluated through 
>>>>>>>>>> phi_has_two_different_args.
>>>>>>>>>> (predicate_scalar_phi): Add invoсation of convert_name_to_cmp if cond
>>>>>>>>>> is SSA_NAME. Add 'false' argument to call of 
>>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>> (predicate_phi_disjoint_args): New function.
>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>> (predicate_all_scalar_phis): Add code to set-up gimple statement
>>>>>>>>>> iterator for predication of extended scalar phi's for insertion.
>>>>>>>>>> (insert_gimplified_predicates): Add test for non-predicated basic
>>>>>>>>>> blocks that there are no gimplified statements to insert. Insert
>>>>>>>>>> predicates at the block begining for extended if-conversion.
>>>>>>>>>> (predicate_mem_writes): Invoke convert_name_to_cmp for extended
>>>>>>>>>> predication to build mask.
>>>>>>>>>> (combine_blocks): Pass flag_force_vectorize to predicate_bbs.
>>>>>>>>>> (split_crit_edge): New function.
>>>>>>>>>> (tree_if_conversion): Initialize flag_force_vectorize from current
>>>>>>>>>> loop or outer loop (to support pragma omp declare). Invoke
>>>>>>>>>> split_crit_edge for extended predication. Do loop versioning for
>>>>>>>>>> innermost loop marked with pragma omp simd.

Reply via email to