On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Richard,
>
> Here is updated patch2 with the following changes:
> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
> 2. Use only one function for extended predication -
> predicate_extended_scalar_phi.
> 3. Save gsi before insertion of predicate computations for basic
> blocks if it has 2 predecessors and
> both incoming edges are critical or it gas more than 2 predecessors
> and at least one incoming edge
> is critical. This saved iterator can be used by extended phi predication.
>
> Here is motivated test-case which explains this point.
> Test-case is attached (t5.c) and it must be compiled with -O2
> -ftree-loop-vectorize -fopenmp options.
> The problem phi is in bb-7:
>
>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>   {
>     <bb 5>:
>     xmax_edge_18 = xmax_edge_36 + 1;
>     if (xmax_17 == xmax_27)
>       goto <bb 7>;
>     else
>       goto <bb 9>;
>
>   }
>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>   {
>     <bb 6>:
>     if (xmax_17 == xmax_27)
>       goto <bb 7>;
>     else
>       goto <bb 8>;
>
>   }
>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>   {
>     <bb 7>:
>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>     xmax_edge_19 = xmax_edge_39 + 1;
>     goto <bb 11>;
>
>   }
>
> Note that both incoming edges to bb_7 are critical. If we comment out
> restoring gsi in predicate_all_scalar_phi:
> #if 0
>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>    gsi = bb_insert_point (bb);
>  else
> #endif
>    gsi = gsi_after_labels (bb);
>
> we will get ICE:
> t5.c: In function 'foo':
> t5.c:9:6: error: definition in block 4 follows the use
>  void foo (int n)
>       ^
> for SSA_NAME: _1 in statement:
> _52 = _1 & _3;
> t5.c:9:6: internal compiler error: verify_ssa failed
>
> smce predicate computations were inserted in bb_7.

The issue is obviously that the predicates have already been emitted
in the target BB - that's of course the wrong place.  This is done
by insert_gimplified_predicates.

This just shows how edge predicate handling is broken - we don't
seem to have a sequence of gimplified stmts for edge predicates
but push those to e->dest which makes this really messy.

Rather than having a separate phase where we insert all
gimplified bb predicates we should do that on-demand when
predicating a PHI.

Your patch writes to stderr - that's bad - use dump_file and guard
the printfs properly.

You also still have two functions for PHI predication.  And the
new extended variant doesn't commonize the 2-args and general
paths.

I'm not at all happy with this code.  It may be existing if-conv codes
fault but making it even worse is not an option.

Again - what's wrong with simply splitting critical edges if
aggressive_if_conv?  I think that would very much simplify
things here.  Or alternatively use gsi_insert_on_edge and
commit edge insertions before merging the blocks.

Thanks,
Richard.

> ChangeLog is
>
> 2014-12-09  Yuri Rumyantsev  <ysrum...@gmail.com>
>
> * tree-if-conv.c : Include hash-map.h.
> (struct bb_predicate_s): Add new field to save copy of gimple
> statement iterator.
> (bb_insert_point): New function.
> (set_bb_insert_point): New function.
> (has_pred_critical_p): New function.
> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
> AGGRESSIVE_IF_CONV is true.
> (if_convertible_bb_p): Delete check that bb has at least one
> non-critical incoming edge.
> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
> Allow interchange PHI arguments if EXTENDED is false.
> Change check that block containing reduction statement candidate
> is predecessor of phi-block since phi may have more than two arguments.
> (predicate_scalar_phi): Add new arguments for call of
> is_cond_scalar_reduction.
> (get_predicate_for_edge): New function.
> (struct phi_args_hash_traits): New type.
> (phi_args_hash_traits::hash): New function.
> (phi_args_hash_traits::equal_keys): New function.
> (gen_phi_arg_condition): New function.
> (predicate_extended_scalar_phi): New function.
> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
> to true if BB containing phi has more than 2 predecessors or both
> incoming edges are critical. Invoke find_phi_replacement_condition and
> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
> has 2 predecessors and both incoming edges are critical or it has more
> than 2 predecessors and atleast one incoming edge is critical.
> Use standard gsi_after_labels otherwise.
> Invoke predicate_extended_scalar_phi if EXTENDED is true.
> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
> to save gsi before insertion of predicate computations. SEt-up it to
> true for BB with 2 predecessors and critical incoming edges either
>         number of predecessors is geater 2 and at least one incoming edge is
> critical.
> Add check that non-predicated block may have statements to insert.
> Insert predicate computation of BB just after label if
> EXTENDED_PREDICATION is true.
> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
> is copy of inner or outer loop force_vectorize field.
>
>
>
>
> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>> Richard,
>>>
>>> I did simple change by saving gsi iterator for each bb that has
>>> critical edges by adding additional field to bb_predicate_s:
>>>
>>> typedef struct bb_predicate_s {
>>>
>>>   /* The condition under which this basic block is executed.  */
>>>   tree predicate;
>>>
>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>      recorded here, in order to avoid the duplication of computations
>>>      that occur in previous conditions.  See PR44483.  */
>>>   gimple_seq predicate_gimplified_stmts;
>>>
>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>   gimple_stmt_iterator gsi;
>>> } *bb_predicate_p;
>>>
>>> and this iterator is saved in  insert_gimplified_predicates before
>>> insertion code for predicate computation. I checked that this fix
>>> works.
>>
>> Huh?  I still wonder what the issue is with inserting everything
>> after the PHI we predicate.
>>
>> Well, your updated patch will come with testcases for the testsuite
>> that will hopefully fail if doing that.
>>
>> Richard.
>>
>>>
>>> Now I am implementing merging of predicate_extended.. and
>>> predicate_arbitrary.. functions as you proposed.
>>>
>>> Best regards.
>>> Yuri.
>>>
>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>>> Thanks Richard for your quick reply!
>>>>>
>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>> 2. What is your opinion about using more simple decision about
>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>> splitting is not a good decision.
>>>>
>>>> Why not always insert before the use?  Which would be after labels,
>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>> can only happen for backedges but those you can't remove in any case.
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> Best regards.
>>>>> Yuri.
>>>>>
>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>> wrote:
>>>>>>> Hi Richard,
>>>>>>>
>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>
>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>
>>>>>>> Let's consider the following simple test-case:
>>>>>>>
>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>   for (i=0; i<512; i++)
>>>>>>>   {
>>>>>>>     float t = a[i];
>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>> res += 1;
>>>>>>>   }
>>>>>>>
>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>
>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>
>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>
>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>> treat it as an 'else' case.  That even works for
>>>>>>
>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>
>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>> ! (condition for 3 || condition for 4).
>>>>>>
>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>> used for edges 3 and 4 combined.
>>>>>>
>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>> predicates, e.g.
>>>>>>>
>>>>>>> <bb 7>:
>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>> _48 = _46 & _47;
>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>> _54 = ~_53;
>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>> _56 = _54 & _55;
>>>>>>> _57 = _48 | _56;
>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>> goto <bb 11>;
>>>>>>>
>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>> beginning but need to put it after predicate computations.
>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>> block end.
>>>>>>
>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>> for critical edges unless you split them).
>>>>>>
>>>>>> I think I've told you before that I prefer simple solutions to such 
>>>>>> issues,
>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>> GENERIC expressions.
>>>>>>
>>>>>> Thanks,
>>>>>> Richard.
>>>>>>
>>>>>>> Let me know if you still have any questions.
>>>>>>>
>>>>>>> Best regards.
>>>>>>> Yuri.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>> Hi All,
>>>>>>>>>
>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>
>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>> lead to less efficient binaries.
>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>> scalar reduction is applied.
>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>  New function phi_has_two_different_args was introduced to detect 
>>>>>>>>> such phi.
>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - it
>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>> are already inserted above this block and
>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>> computations are  in the block where code for phi predication must be
>>>>>>>>> inserted. So new function find_insertion_point is introduced which is
>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>
>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>
>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>
>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>
>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>> inserted code if you insert after labels just like for the other case.
>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course 
>>>>>>>> needs
>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>
>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>
>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>
>>>>>>>>> ChangeLog:
>>>>>>>>>
>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>
>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>> non-critical incoming edge.
>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>> statement before/after gsi point.
>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>> EXTENDED value.
>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>> may have statements to insert. Insert predicate of BB just after label
>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE which
>>>>>>>>> is copy of inner or outer loop field force_vectorize.

Reply via email to