Re: [PATCH 2/3] Extended if-conversion

Richard Biener Fri, 19 Dec 2014 03:45:45 -0800

On Thu, Dec 18, 2014 at 2:45 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Richard,
>
> I am sending you full patch (~1000 lines) but if you need only patch.1
> and patch.2 will let me know and i'll send you reduced patch.
>
> Below are few comments regarding your remarks for patch.3.
>
> 1. I deleted sub-phase ifcvt_local_dce since I did not find test-case
> when dead code elimination is required to vectorize loop, i.e. dead
> statement is marked as relevant.
> 2. You wrote:
>> The "retry" code also looks odd - why do you walk the BB multiple
>> times instead of just doing sth like
>>
>>  while (!has_single_use (lhs))
>>    {
>>      gimple copy = ifcvt_split_def_stmt (def_stmt);
>>      ifcvt_walk_pattern_tree (copy);
>>    }
>>
>> thus returning the copy you create and re-process it (the copy should
>> now have a single-use).
>
> The problem is that not only top SSA_NAME (lhs) may have multiple uses
> but some intermediate variables too. For example, for the following
> test-case
>
> float a[1000];
> int c[1000];
>
> int foo()
> {
>   int i, res = 0;
> #pragma omp simd safelen(8)
>   for (i=0; i<512; i++)
>   {
>     float t = a[i];
>     if (t > 0.0f & t < 1.0e+17f)
>       if (c[i] != 0)
> res += 1;
>   }
>   return res;
> }
>
> After combine_blocks we have the following bb:
>
> <bb 3>:
> # res_15 = PHI <res_1(7), 0(15)>
> # i_16 = PHI <i_11(7), 0(15)>
> # ivtmp_14 = PHI <ivtmp_13(7), 512(15)>
> t_5 = a[i_16];
> _6 = t_5 > 0.0;
> _7 = t_5 < 9.9999998430674944e+16;
> _8 = _6 & _7;
> _10 = &c[i_16];
> _ifc__32 = _8 ? 4294967295 : 0;
> _9 = MASK_LOAD (_10, 0B, _ifc__32);
> _28 = _8;
> _29 = _9 != 0;
> _30 = _28 & _29;
> _ifc__31 = _30 ? 1 : 0;
> res_1 = res_15 + _ifc__31;
> i_11 = i_16 + 1;
> ivtmp_13 = ivtmp_14 - 1;
> if (ivtmp_13 != 0)
>   goto <bb 7>;
> else
>   goto <bb 8>;
>
> and we can see that _8 has multiple uses. Also note that after splitting of
> _8 = _6 & _7
> we also get multiple uses for definition of  _6 and _7. So I used this
> iterative algorithm as the simplest one.


But it walks the entire pattern again and again while you only need to
ensure you walk the pattern tree of the now single-use DEF again
(in fact, rather than replacing a random USE in ifcvt_split_def_stmt
you should pass down the user_operand_p that you need to make
single-use).

> I think it would be nice to re-use some utility from tree-vect-patterns.c
> for stmt_is_root_of_bool_pattern.
>
> I assume that function stmt_is_root_of_bool_pattern can be simplified
> to check on COND_EXPR only since PHI predication and memory access
> predication produced only such statements,i.e. it can look like
>
> static bool
> stmt_is_root_of_bool_pattern (gimple stmt, tree *var)
> {
>   enum tree_code code;
>   tree lhs, rhs;
>
>   code = gimple_assign_rhs_code (stmt);
>   if (code == COND_EXPR)
>     {
>       rhs = gimple_assign_rhs1 (stmt);
>       if (TREE_CODE (rhs) != SSA_NAME)
> return false;
>       *var = rhs;
>       return true;
>     }
>   return false;
> }
>
> I also did few minor changes in patch.2.
>
> 3. You can also notice that I inserted code in tree_if_conversion to
> do loop version if explicit option "-ftree-loop-if-convert" was not
> passed to compiler, i.e. we perform if-conversion for loop
> vectorization only and if it does not take place, we should delete
> if-converted version of loop.
> What is your opinion?

Overall part 1 and part 2 look good to me, predicate_scalar_phi
looks in need of some refactoring to avoid duplicate code.  We can
do that a followup.

Part 3 still needs the iteration to be resolved and make the use we
actually care about single-use, not a random one so we can avoid
iterating completely.

Richard.

> Thanks.
> Yuri.
>
> 2014-12-17 18:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>> On Tue, Dec 16, 2014 at 4:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>> Hi Richard,
>>>
>>> Here is updated patch which includes
>>> (1) split critical edges for aggressive if conversion.
>>> (2) delete all stuff related to support of critical edge predication.
>>> (3) only one function - predicate_scalar_phi performs predication.
>>> (4) function find_phi_replacement_condition was deleted since it was
>>> included in predicate_scalar_phi for phi with two arguments.
>>>
>>> I checked that patch works in stress testing mode, i.e. with
>>> aggressive if conversion by default.
>>>
>>> What is your opinion?
>>
>> Looks ok overall, but please simply do
>>
>>   FOR_EACH_EDGE (e, ei, bb->succs)
>>     if (EDGE_CRITICAL_P (e) && e->dest->loop_father == loop)
>>       split_edge (e);
>>
>> for all blocks apart from the latch.
>>
>> Can you please send a combined patch up to this one?  Looking at
>> the incremental diff is somewhat hard.  Thus a patch including all
>> patches from patch1 to this one.
>>
>> Thanks,
>> Richard.
>>
>>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2014-12-11 11:59 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>> On Wed, Dec 10, 2014 at 4:22 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>> wrote:
>>>>> Richard,
>>>>>
>>>>> Thanks for your reply!
>>>>>
>>>>> I didn't understand your point:
>>>>>
>>>>> Well, I don't mind splitting all critical edges unconditionally
>>>>>
>>>>> but you do it unconditionally in proposed patch.
>>>>
>>>> I don't mind means I am fine with it.
>>>>
>>>>> Also I assume that
>>>>> call of split_critical_edges() can break ssa. For example, we can
>>>>> split headers of loops, loop exit blocks etc.
>>>>
>>>> How does that "break SSA"?  You mean loop-closed SSA?  I'd
>>>> be surprised if so but that may be possible.
>>>>
>>>>> I prefer to do something
>>>>> more loop-specialized, e.g. call edge_split() for critical edges
>>>>> outgoing from bb ending with GIMPLE_COND stmt (assuming that edge
>>>>> destination bb belongs to loop).
>>>>
>>>> That works for me as well but it is more complicated to implement.
>>>> Ideally you'd only split one edge if you find a block with only critical
>>>> predecessors (where we'd currently give up).  But note that this
>>>> requires re-computation of ifc_bbs in if_convertible_loop_p_1 and it
>>>> will change loop->num_nodes so we have to be more careful in
>>>> constructing the loop calling if_convertible_bb_p.
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> 2014-12-10 17:31 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>> On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>> wrote:
>>>>>>> Richard,
>>>>>>>
>>>>>>> Sorry that I forgot to delete debug dump from my fix.
>>>>>>> I have few questions about your comments.
>>>>>>>
>>>>>>> 1. You wrote :
>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>> path
>>>>>>>  Did you mean that I must combine predicate_scalar_phi and
>>>>>>> predicate_extended scalar phi to one function?
>>>>>>> Please note that if additional flag was not set up (i.e.
>>>>>>> aggressive_if_conv is false) extended predication is required more
>>>>>>> compile time since it builds hash_map.
>>>>>>
>>>>>> It's compile-time complexity is reasonable enough even for
>>>>>> non-aggressive if-conversion.
>>>>>>
>>>>>>> 2. About critical edge splitting.
>>>>>>>
>>>>>>> Did you mean that we should perform it (1) under aggressive_if_conv
>>>>>>> option only; (2) should we split all critical edges.
>>>>>>> Note that this leads to recomputing of topological order.
>>>>>>
>>>>>> Well, I don't mind splitting all critical edges unconditionally, thus
>>>>>> do something like
>>>>>>
>>>>>> Index: gcc/tree-if-conv.c
>>>>>> ===================================================================
>>>>>> --- gcc/tree-if-conv.c  (revision 218515)
>>>>>> +++ gcc/tree-if-conv.c  (working copy)
>>>>>> @@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
>>>>>>    if (number_of_loops (fun) <= 1)
>>>>>>      return 0;
>>>>>>
>>>>>> +  bool critical_edges_split_p = false;
>>>>>>    FOR_EACH_LOOP (loop, 0)
>>>>>>      if (flag_tree_loop_if_convert == 1
>>>>>>         || flag_tree_loop_if_convert_stores == 1
>>>>>>         || ((flag_tree_loop_vectorize || loop->force_vectorize)
>>>>>>             && !loop->dont_vectorize))
>>>>>> -      todo |= tree_if_conversion (loop);
>>>>>> +      {
>>>>>> +       if (!critical_edges_split_p)
>>>>>> +         {
>>>>>> +           split_critical_edges ();
>>>>>> +           critical_edges_split_p = true;
>>>>>> +           todo |= TODO_cleanup_cfg;
>>>>>> +         }
>>>>>> +       todo |= tree_if_conversion (loop);
>>>>>> +      }
>>>>>>
>>>>>>  #ifdef ENABLE_CHECKING
>>>>>>    {
>>>>>>
>>>>>>> It is worth noting that in current implementation bb's with 2
>>>>>>> predecessors and both are on critical edges are accepted without
>>>>>>> additional option.
>>>>>>
>>>>>> Yes, I know.
>>>>>>
>>>>>> tree-if-conv.c is a mess right now and if we can avoid adding more
>>>>>> to it and even fix the critical edge missed optimization with splitting
>>>>>> critical edges then I am all for that solution.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Thanks ahead.
>>>>>>> Yuri.
>>>>>>> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>> Richard,
>>>>>>>>>
>>>>>>>>> Here is updated patch2 with the following changes:
>>>>>>>>> 1. Delete functions  phi_has_two_different_args and 
>>>>>>>>> find_insertion_point.
>>>>>>>>> 2. Use only one function for extended predication -
>>>>>>>>> predicate_extended_scalar_phi.
>>>>>>>>> 3. Save gsi before insertion of predicate computations for basic
>>>>>>>>> blocks if it has 2 predecessors and
>>>>>>>>> both incoming edges are critical or it gas more than 2 predecessors
>>>>>>>>> and at least one incoming edge
>>>>>>>>> is critical. This saved iterator can be used by extended phi 
>>>>>>>>> predication.
>>>>>>>>>
>>>>>>>>> Here is motivated test-case which explains this point.
>>>>>>>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>>>>>>>> -ftree-loop-vectorize -fopenmp options.
>>>>>>>>> The problem phi is in bb-7:
>>>>>>>>>
>>>>>>>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>>>>>>>   {
>>>>>>>>>     <bb 5>:
>>>>>>>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>       goto <bb 7>;
>>>>>>>>>     else
>>>>>>>>>       goto <bb 9>;
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>>>>>>>   {
>>>>>>>>>     <bb 6>:
>>>>>>>>>     if (xmax_17 == xmax_27)
>>>>>>>>>       goto <bb 7>;
>>>>>>>>>     else
>>>>>>>>>       goto <bb 8>;
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>>>>>>>   {
>>>>>>>>>     <bb 7>:
>>>>>>>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>     goto <bb 11>;
>>>>>>>>>
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>>>>>>>> restoring gsi in predicate_all_scalar_phi:
>>>>>>>>> #if 0
>>>>>>>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>>>>>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>>>>>>>    gsi = bb_insert_point (bb);
>>>>>>>>>  else
>>>>>>>>> #endif
>>>>>>>>>    gsi = gsi_after_labels (bb);
>>>>>>>>>
>>>>>>>>> we will get ICE:
>>>>>>>>> t5.c: In function 'foo':
>>>>>>>>> t5.c:9:6: error: definition in block 4 follows the use
>>>>>>>>>  void foo (int n)
>>>>>>>>>       ^
>>>>>>>>> for SSA_NAME: _1 in statement:
>>>>>>>>> _52 = _1 & _3;
>>>>>>>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>>>>>>>
>>>>>>>>> smce predicate computations were inserted in bb_7.
>>>>>>>>
>>>>>>>> The issue is obviously that the predicates have already been emitted
>>>>>>>> in the target BB - that's of course the wrong place.  This is done
>>>>>>>> by insert_gimplified_predicates.
>>>>>>>>
>>>>>>>> This just shows how edge predicate handling is broken - we don't
>>>>>>>> seem to have a sequence of gimplified stmts for edge predicates
>>>>>>>> but push those to e->dest which makes this really messy.
>>>>>>>>
>>>>>>>> Rather than having a separate phase where we insert all
>>>>>>>> gimplified bb predicates we should do that on-demand when
>>>>>>>> predicating a PHI.
>>>>>>>>
>>>>>>>> Your patch writes to stderr - that's bad - use dump_file and guard
>>>>>>>> the printfs properly.
>>>>>>>>
>>>>>>>> You also still have two functions for PHI predication.  And the
>>>>>>>> new extended variant doesn't commonize the 2-args and general
>>>>>>>> paths.
>>>>>>>>
>>>>>>>> I'm not at all happy with this code.  It may be existing if-conv codes
>>>>>>>> fault but making it even worse is not an option.
>>>>>>>>
>>>>>>>> Again - what's wrong with simply splitting critical edges if
>>>>>>>> aggressive_if_conv?  I think that would very much simplify
>>>>>>>> things here.  Or alternatively use gsi_insert_on_edge and
>>>>>>>> commit edge insertions before merging the blocks.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> ChangeLog is
>>>>>>>>>
>>>>>>>>> 2014-12-09  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>
>>>>>>>>> * tree-if-conv.c : Include hash-map.h.
>>>>>>>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>>>>>>>> statement iterator.
>>>>>>>>> (bb_insert_point): New function.
>>>>>>>>> (set_bb_insert_point): New function.
>>>>>>>>> (has_pred_critical_p): New function.
>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>> AGGRESSIVE_IF_CONV is true.
>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>> non-critical incoming edge.
>>>>>>>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>>>>>>>> Allow interchange PHI arguments if EXTENDED is false.
>>>>>>>>> Change check that block containing reduction statement candidate
>>>>>>>>> is predecessor of phi-block since phi may have more than two 
>>>>>>>>> arguments.
>>>>>>>>> (predicate_scalar_phi): Add new arguments for call of
>>>>>>>>> is_cond_scalar_reduction.
>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>> (struct phi_args_hash_traits): New type.
>>>>>>>>> (phi_args_hash_traits::hash): New function.
>>>>>>>>> (phi_args_hash_traits::equal_keys): New function.
>>>>>>>>> (gen_phi_arg_condition): New function.
>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>>>>>>>> to true if BB containing phi has more than 2 predecessors or both
>>>>>>>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>>>>>>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>>>>>>>> has 2 predecessors and both incoming edges are critical or it has more
>>>>>>>>> than 2 predecessors and atleast one incoming edge is critical.
>>>>>>>>> Use standard gsi_after_labels otherwise.
>>>>>>>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>>>>>>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>>>>>>>> to save gsi before insertion of predicate computations. SEt-up it to
>>>>>>>>> true for BB with 2 predecessors and critical incoming edges either
>>>>>>>>>         number of predecessors is geater 2 and at least one incoming 
>>>>>>>>> edge is
>>>>>>>>> critical.
>>>>>>>>> Add check that non-predicated block may have statements to insert.
>>>>>>>>> Insert predicate computation of BB just after label if
>>>>>>>>> EXTENDED_PREDICATION is true.
>>>>>>>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>>>>>>>> is copy of inner or outer loop force_vectorize field.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-12-04 16:37 GMT+03:00 Richard Biener 
>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Richard,
>>>>>>>>>>>
>>>>>>>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>>>>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>>>>>>>
>>>>>>>>>>> typedef struct bb_predicate_s {
>>>>>>>>>>>
>>>>>>>>>>>   /* The condition under which this basic block is executed.  */
>>>>>>>>>>>   tree predicate;
>>>>>>>>>>>
>>>>>>>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>>>>>>>      recorded here, in order to avoid the duplication of 
>>>>>>>>>>> computations
>>>>>>>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>>>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>>>>>>>
>>>>>>>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>>>>>>>   gimple_stmt_iterator gsi;
>>>>>>>>>>> } *bb_predicate_p;
>>>>>>>>>>>
>>>>>>>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>>>>>>>> insertion code for predicate computation. I checked that this fix
>>>>>>>>>>> works.
>>>>>>>>>>
>>>>>>>>>> Huh?  I still wonder what the issue is with inserting everything
>>>>>>>>>> after the PHI we predicate.
>>>>>>>>>>
>>>>>>>>>> Well, your updated patch will come with testcases for the testsuite
>>>>>>>>>> that will hopefully fail if doing that.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Now I am implementing merging of predicate_extended.. and
>>>>>>>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>>>>>>>
>>>>>>>>>>> Best regards.
>>>>>>>>>>> Yuri.
>>>>>>>>>>>
>>>>>>>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener 
>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev 
>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>> Thanks Richard for your quick reply!
>>>>>>>>>>>>>
>>>>>>>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>>>>>>>> insertion point - if bb has use of phi result insert phi 
>>>>>>>>>>>>> predication
>>>>>>>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>>>>>>>> splitting is not a good decision.
>>>>>>>>>>>>
>>>>>>>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>>>>>>>> what we do for two-arg PHIs.  That is, how can it be that you 
>>>>>>>>>>>> predicate
>>>>>>>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>>>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>>>>>>>> can only happen for backedges but those you can't remove in any 
>>>>>>>>>>>> case.
>>>>>>>>>>>>
>>>>>>>>>>>> Richard.
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener 
>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>> Hi Richard,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>>>>>>>> 2. Use static cast for the first argument of 
>>>>>>>>>>>>>>> gimple_phi_arg_edge.
>>>>>>>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>>>>>>>   {
>>>>>>>>>>>>>>>     float t = a[i];
>>>>>>>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>>>>>>>> res += 1;
>>>>>>>>>>>>>>>   }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is clear that we can optimize it to phi node with 2 
>>>>>>>>>>>>>>> arguments only
>>>>>>>>>>>>>>> and only one check can be used for phi predication (for 
>>>>>>>>>>>>>>> reduction in
>>>>>>>>>>>>>>> our case), namely predicate of bb_5. In general case we can't 
>>>>>>>>>>>>>>> do it
>>>>>>>>>>>>>>> even if we sort all phi argument values since we still have to 
>>>>>>>>>>>>>>> produce
>>>>>>>>>>>>>>> a chain of cond expressions to perform phi predication (see 
>>>>>>>>>>>>>>> comments
>>>>>>>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> How so?  We can always use !(condition) for the "last" value, 
>>>>>>>>>>>>>> thus
>>>>>>>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c 
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes 
>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>> only critical incoming edges and both contain code computing 
>>>>>>>>>>>>>>> edge
>>>>>>>>>>>>>>> predicates, e.g.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <bb 7>:
>>>>>>>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>> _48 = _46 & _47;
>>>>>>>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>>>>>>>> _54 = ~_53;
>>>>>>>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>>>>>>>> _56 = _54 & _55;
>>>>>>>>>>>>>>> _57 = _48 | _56;
>>>>>>>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>>>>>>>> goto <bb 11>;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>>>>>>>> insertion point will be "after labels" Note also that phi 
>>>>>>>>>>>>>>> result can
>>>>>>>>>>>>>>> have use in this block too, so we can't put predication code to 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> block end.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>>>>>>>> for critical edges unless you split them).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think I've told you before that I prefer simple solutions to 
>>>>>>>>>>>>>> such issues,
>>>>>>>>>>>>>> like splitting the edge!  Certainly not involving a function 
>>>>>>>>>>>>>> walking
>>>>>>>>>>>>>> GENERIC expressions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best regards.
>>>>>>>>>>>>>>> Yuri.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener 
>>>>>>>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev 
>>>>>>>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since 
>>>>>>>>>>>>>>>>> it may
>>>>>>>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>>>>>>>> 2. One special case of extended PHI node predication was 
>>>>>>>>>>>>>>>>> introduced
>>>>>>>>>>>>>>>>> when #arguments is more than 2 but only two arguments are 
>>>>>>>>>>>>>>>>> different
>>>>>>>>>>>>>>>>> and one argument has the only occurrence. For such PHI 
>>>>>>>>>>>>>>>>> conditional
>>>>>>>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>>>>>>>  New function phi_has_two_different_args was introduced to 
>>>>>>>>>>>>>>>>> detect such phi.
>>>>>>>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption 
>>>>>>>>>>>>>>>>> that at
>>>>>>>>>>>>>>>>> least one incoming edge for blocks containing PHI is not 
>>>>>>>>>>>>>>>>> critical - it
>>>>>>>>>>>>>>>>> guarantees that all computations related to predicate of 
>>>>>>>>>>>>>>>>> normal edge
>>>>>>>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>>>>>>>> code related to PHI predication can be inserted at the 
>>>>>>>>>>>>>>>>> beginning of
>>>>>>>>>>>>>>>>> block. But this is not true for critical edges for which 
>>>>>>>>>>>>>>>>> predicate
>>>>>>>>>>>>>>>>> computations are  in the block where code for phi predication 
>>>>>>>>>>>>>>>>> must be
>>>>>>>>>>>>>>>>> inserted. So new function find_insertion_point is introduced 
>>>>>>>>>>>>>>>>> which is
>>>>>>>>>>>>>>>>> simply found out the last statement in block defining 
>>>>>>>>>>>>>>>>> predicates
>>>>>>>>>>>>>>>>> correspondent to all incoming edges and insert phi 
>>>>>>>>>>>>>>>>> predication code
>>>>>>>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>>>>>>>> predicate_extended_scalar_phi and 
>>>>>>>>>>>>>>>> predicate_arbitrary_scalar_phi.
>>>>>>>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after 
>>>>>>>>>>>>>>>> value
>>>>>>>>>>>>>>>> and handle equal values specially in 
>>>>>>>>>>>>>>>> predicate_extended_scalar_phi?
>>>>>>>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA 
>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>> required for the predicates are defined upward - and the 
>>>>>>>>>>>>>>>> complex CFG
>>>>>>>>>>>>>>>> is squashed to a single basic-block, thus the defs will 
>>>>>>>>>>>>>>>> dominate the
>>>>>>>>>>>>>>>> inserted code if you insert after labels just like for the 
>>>>>>>>>>>>>>>> other case.
>>>>>>>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of 
>>>>>>>>>>>>>>>> course needs
>>>>>>>>>>>>>>>> to happen in dominator order - but I guess that happens 
>>>>>>>>>>>>>>>> already?)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag 
>>>>>>>>>>>>>>>> even
>>>>>>>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args 
>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do 
>>>>>>>>>>>>>>>> this as
>>>>>>>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>>>>>>>> to something less looking like a flag, like simply 
>>>>>>>>>>>>>>>> 'aggressive'.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Richard.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors 
>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose 
>>>>>>>>>>>>>>>>> access
>>>>>>>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means 
>>>>>>>>>>>>>>>>> non-extended
>>>>>>>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables 
>>>>>>>>>>>>>>>>> EXTENDED and
>>>>>>>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has 
>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. 
>>>>>>>>>>>>>>>>> Invoke
>>>>>>>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi 
>>>>>>>>>>>>>>>>> depending on
>>>>>>>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated 
>>>>>>>>>>>>>>>>> block
>>>>>>>>>>>>>>>>> may have statements to insert. Insert predicate of BB just 
>>>>>>>>>>>>>>>>> after label
>>>>>>>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>>>>>>>> (tree_if_conversion): Add initialization of 
>>>>>>>>>>>>>>>>> FLAG_FORCE_VECTORIZE which
>>>>>>>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

Re: [PATCH 2/3] Extended if-conversion

Reply via email to