On Wed, Dec 10, 2014 at 11:54 AM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
> Richard,
>
> Sorry that I forgot to delete debug dump from my fix.
> I have few questions about your comments.
>
> 1. You wrote :
>> You also still have two functions for PHI predication.  And the
>> new extended variant doesn't commonize the 2-args and general
>> path
>  Did you mean that I must combine predicate_scalar_phi and
> predicate_extended scalar phi to one function?
> Please note that if additional flag was not set up (i.e.
> aggressive_if_conv is false) extended predication is required more
> compile time since it builds hash_map.

It's compile-time complexity is reasonable enough even for
non-aggressive if-conversion.

> 2. About critical edge splitting.
>
> Did you mean that we should perform it (1) under aggressive_if_conv
> option only; (2) should we split all critical edges.
> Note that this leads to recomputing of topological order.

Well, I don't mind splitting all critical edges unconditionally, thus
do something like

Index: gcc/tree-if-conv.c
===================================================================
--- gcc/tree-if-conv.c  (revision 218515)
+++ gcc/tree-if-conv.c  (working copy)
@@ -2235,12 +2235,21 @@ pass_if_conversion::execute (function *f
   if (number_of_loops (fun) <= 1)
     return 0;

+  bool critical_edges_split_p = false;
   FOR_EACH_LOOP (loop, 0)
     if (flag_tree_loop_if_convert == 1
        || flag_tree_loop_if_convert_stores == 1
        || ((flag_tree_loop_vectorize || loop->force_vectorize)
            && !loop->dont_vectorize))
-      todo |= tree_if_conversion (loop);
+      {
+       if (!critical_edges_split_p)
+         {
+           split_critical_edges ();
+           critical_edges_split_p = true;
+           todo |= TODO_cleanup_cfg;
+         }
+       todo |= tree_if_conversion (loop);
+      }

 #ifdef ENABLE_CHECKING
   {

> It is worth noting that in current implementation bb's with 2
> predecessors and both are on critical edges are accepted without
> additional option.

Yes, I know.

tree-if-conv.c is a mess right now and if we can avoid adding more
to it and even fix the critical edge missed optimization with splitting
critical edges then I am all for that solution.

Richard.

> Thanks ahead.
> Yuri.
> 2014-12-09 18:20 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>> On Tue, Dec 9, 2014 at 2:11 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>> Richard,
>>>
>>> Here is updated patch2 with the following changes:
>>> 1. Delete functions  phi_has_two_different_args and find_insertion_point.
>>> 2. Use only one function for extended predication -
>>> predicate_extended_scalar_phi.
>>> 3. Save gsi before insertion of predicate computations for basic
>>> blocks if it has 2 predecessors and
>>> both incoming edges are critical or it gas more than 2 predecessors
>>> and at least one incoming edge
>>> is critical. This saved iterator can be used by extended phi predication.
>>>
>>> Here is motivated test-case which explains this point.
>>> Test-case is attached (t5.c) and it must be compiled with -O2
>>> -ftree-loop-vectorize -fopenmp options.
>>> The problem phi is in bb-7:
>>>
>>>   bb_5 (preds = {bb_4 }, succs = {bb_7 bb_9 })
>>>   {
>>>     <bb 5>:
>>>     xmax_edge_18 = xmax_edge_36 + 1;
>>>     if (xmax_17 == xmax_27)
>>>       goto <bb 7>;
>>>     else
>>>       goto <bb 9>;
>>>
>>>   }
>>>   bb_6 (preds = {bb_4 }, succs = {bb_7 bb_8 })
>>>   {
>>>     <bb 6>:
>>>     if (xmax_17 == xmax_27)
>>>       goto <bb 7>;
>>>     else
>>>       goto <bb 8>;
>>>
>>>   }
>>>   bb_7 (preds = {bb_6 bb_5 }, succs = {bb_11 })
>>>   {
>>>     <bb 7>:
>>>     # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>     xmax_edge_19 = xmax_edge_39 + 1;
>>>     goto <bb 11>;
>>>
>>>   }
>>>
>>> Note that both incoming edges to bb_7 are critical. If we comment out
>>> restoring gsi in predicate_all_scalar_phi:
>>> #if 0
>>>  if ((EDGE_COUNT (bb->preds) == 2 && all_preds_critical_p (bb))
>>>      || (EDGE_COUNT (bb->preds) > 2 && has_pred_critical_p (bb)))
>>>    gsi = bb_insert_point (bb);
>>>  else
>>> #endif
>>>    gsi = gsi_after_labels (bb);
>>>
>>> we will get ICE:
>>> t5.c: In function 'foo':
>>> t5.c:9:6: error: definition in block 4 follows the use
>>>  void foo (int n)
>>>       ^
>>> for SSA_NAME: _1 in statement:
>>> _52 = _1 & _3;
>>> t5.c:9:6: internal compiler error: verify_ssa failed
>>>
>>> smce predicate computations were inserted in bb_7.
>>
>> The issue is obviously that the predicates have already been emitted
>> in the target BB - that's of course the wrong place.  This is done
>> by insert_gimplified_predicates.
>>
>> This just shows how edge predicate handling is broken - we don't
>> seem to have a sequence of gimplified stmts for edge predicates
>> but push those to e->dest which makes this really messy.
>>
>> Rather than having a separate phase where we insert all
>> gimplified bb predicates we should do that on-demand when
>> predicating a PHI.
>>
>> Your patch writes to stderr - that's bad - use dump_file and guard
>> the printfs properly.
>>
>> You also still have two functions for PHI predication.  And the
>> new extended variant doesn't commonize the 2-args and general
>> paths.
>>
>> I'm not at all happy with this code.  It may be existing if-conv codes
>> fault but making it even worse is not an option.
>>
>> Again - what's wrong with simply splitting critical edges if
>> aggressive_if_conv?  I think that would very much simplify
>> things here.  Or alternatively use gsi_insert_on_edge and
>> commit edge insertions before merging the blocks.
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog is
>>>
>>> 2014-12-09  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>
>>> * tree-if-conv.c : Include hash-map.h.
>>> (struct bb_predicate_s): Add new field to save copy of gimple
>>> statement iterator.
>>> (bb_insert_point): New function.
>>> (set_bb_insert_point): New function.
>>> (has_pred_critical_p): New function.
>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>> AGGRESSIVE_IF_CONV is true.
>>> (if_convertible_bb_p): Delete check that bb has at least one
>>> non-critical incoming edge.
>>> (is_cond_scalar_reduction): Add arguments ARG_0, ARG_1 and EXTENDED.
>>> Allow interchange PHI arguments if EXTENDED is false.
>>> Change check that block containing reduction statement candidate
>>> is predecessor of phi-block since phi may have more than two arguments.
>>> (predicate_scalar_phi): Add new arguments for call of
>>> is_cond_scalar_reduction.
>>> (get_predicate_for_edge): New function.
>>> (struct phi_args_hash_traits): New type.
>>> (phi_args_hash_traits::hash): New function.
>>> (phi_args_hash_traits::equal_keys): New function.
>>> (gen_phi_arg_condition): New function.
>>> (predicate_extended_scalar_phi): New function.
>>> (predicate_all_scalar_phis): Add boolean variable EXTENDED and set it
>>> to true if BB containing phi has more than 2 predecessors or both
>>> incoming edges are critical. Invoke find_phi_replacement_condition and
>>> predicate_scalar_phi if EXTENDED is false. Use saved gsi if BB
>>> has 2 predecessors and both incoming edges are critical or it has more
>>> than 2 predecessors and atleast one incoming edge is critical.
>>> Use standard gsi_after_labels otherwise.
>>> Invoke predicate_extended_scalar_phi if EXTENDED is true.
>>> (insert_gimplified_predicates): Add bool variable EXTENDED_PREDICATION
>>> to save gsi before insertion of predicate computations. SEt-up it to
>>> true for BB with 2 predecessors and critical incoming edges either
>>>         number of predecessors is geater 2 and at least one incoming edge is
>>> critical.
>>> Add check that non-predicated block may have statements to insert.
>>> Insert predicate computation of BB just after label if
>>> EXTENDED_PREDICATION is true.
>>> (tree_if_conversion): Add initialization of AGGRESSIVE_IF_CONV which
>>> is copy of inner or outer loop force_vectorize field.
>>>
>>>
>>>
>>>
>>> 2014-12-04 16:37 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>> On Thu, Dec 4, 2014 at 2:15 PM, Yuri Rumyantsev <ysrum...@gmail.com> wrote:
>>>>> Richard,
>>>>>
>>>>> I did simple change by saving gsi iterator for each bb that has
>>>>> critical edges by adding additional field to bb_predicate_s:
>>>>>
>>>>> typedef struct bb_predicate_s {
>>>>>
>>>>>   /* The condition under which this basic block is executed.  */
>>>>>   tree predicate;
>>>>>
>>>>>   /* PREDICATE is gimplified, and the sequence of statements is
>>>>>      recorded here, in order to avoid the duplication of computations
>>>>>      that occur in previous conditions.  See PR44483.  */
>>>>>   gimple_seq predicate_gimplified_stmts;
>>>>>
>>>>>   /* Insertion point for blocks having incoming critical edges.  */
>>>>>   gimple_stmt_iterator gsi;
>>>>> } *bb_predicate_p;
>>>>>
>>>>> and this iterator is saved in  insert_gimplified_predicates before
>>>>> insertion code for predicate computation. I checked that this fix
>>>>> works.
>>>>
>>>> Huh?  I still wonder what the issue is with inserting everything
>>>> after the PHI we predicate.
>>>>
>>>> Well, your updated patch will come with testcases for the testsuite
>>>> that will hopefully fail if doing that.
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> Now I am implementing merging of predicate_extended.. and
>>>>> predicate_arbitrary.. functions as you proposed.
>>>>>
>>>>> Best regards.
>>>>> Yuri.
>>>>>
>>>>> 2014-12-04 15:41 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>> On Tue, Dec 2, 2014 at 4:28 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>> wrote:
>>>>>>> Thanks Richard for your quick reply!
>>>>>>>
>>>>>>> 1. I agree that we can combine predicate_extended_ and
>>>>>>> predicate_arbitrary_ to one function as you proposed.
>>>>>>> 2. What is your opinion about using more simple decision about
>>>>>>> insertion point - if bb has use of phi result insert phi predication
>>>>>>> before it and at the bb end otherwise. I assume that critical edge
>>>>>>> splitting is not a good decision.
>>>>>>
>>>>>> Why not always insert before the use?  Which would be after labels,
>>>>>> what we do for two-arg PHIs.  That is, how can it be that you predicate
>>>>>> a PHI in BB1 and then for an edge predicate on one of its incoming
>>>>>> edges you get SSA uses with defs that are in BB1 itself?  That
>>>>>> can only happen for backedges but those you can't remove in any case.
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>
>>>>>>> Best regards.
>>>>>>> Yuri.
>>>>>>>
>>>>>>> 2014-12-02 16:28 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
>>>>>>>> On Mon, Dec 1, 2014 at 4:53 PM, Yuri Rumyantsev <ysrum...@gmail.com> 
>>>>>>>> wrote:
>>>>>>>>> Hi Richard,
>>>>>>>>>
>>>>>>>>> I resend you patch1 and patch2 with minor changes:
>>>>>>>>> 1. I renamed flag_force_vectorize to aggressive_if_conv.
>>>>>>>>> 2. Use static cast for the first argument of gimple_phi_arg_edge.
>>>>>>>>> I also very sorry that I sent you bad patch.
>>>>>>>>>
>>>>>>>>> Now let me answer on your questions related to second patch.
>>>>>>>>> 1. Why we need both predicate_extended_scalar_phi and
>>>>>>>>> predicate_arbitrary_scalar_phi?
>>>>>>>>>
>>>>>>>>> Let's consider the following simple test-case:
>>>>>>>>>
>>>>>>>>>   #pragma omp simd safelen(8)
>>>>>>>>>   for (i=0; i<512; i++)
>>>>>>>>>   {
>>>>>>>>>     float t = a[i];
>>>>>>>>>     if (t > 0.0f & t < 1.0e+17f)
>>>>>>>>>       if (c[i] != 0)  /* c is integer array. */
>>>>>>>>> res += 1;
>>>>>>>>>   }
>>>>>>>>>
>>>>>>>>> we can see the following phi node correspondent to res:
>>>>>>>>>
>>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5)>
>>>>>>>>>
>>>>>>>>> It is clear that we can optimize it to phi node with 2 arguments only
>>>>>>>>> and only one check can be used for phi predication (for reduction in
>>>>>>>>> our case), namely predicate of bb_5. In general case we can't do it
>>>>>>>>> even if we sort all phi argument values since we still have to produce
>>>>>>>>> a chain of cond expressions to perform phi predication (see comments
>>>>>>>>> for predicate_arbitrary_scalar_phi).
>>>>>>>>
>>>>>>>> How so?  We can always use !(condition) for the "last" value, thus
>>>>>>>> treat it as an 'else' case.  That even works for
>>>>>>>>
>>>>>>>> # res_1 = PHI <res_15(3), res_15(4), res_10(5), res_10(7)>
>>>>>>>>
>>>>>>>> where the condition for edges 5 and 7 can be computed as
>>>>>>>> ! (condition for 3 || condition for 4).
>>>>>>>>
>>>>>>>> Of course it is worthwhile to also sort single-occurances first
>>>>>>>> so your case gets just the condiiton for edge 5 and its inversion
>>>>>>>> used for edges 3 and 4 combined.
>>>>>>>>
>>>>>>>>> 2. Why we need to introduce find_insertion_point?
>>>>>>>>>  Let's consider another test-case extracted from 175.vpr ( t5.c is
>>>>>>>>> attached) and we can see that bb_7 and bb_9 containig phi nodes has
>>>>>>>>> only critical incoming edges and both contain code computing edge
>>>>>>>>> predicates, e.g.
>>>>>>>>>
>>>>>>>>> <bb 7>:
>>>>>>>>> # xmax_edge_30 = PHI <xmax_edge_36(6), xmax_edge_18(5)>
>>>>>>>>> _46 = xmax_17 == xmax_37;
>>>>>>>>> _47 = xmax_17 == xmax_27;
>>>>>>>>> _48 = _46 & _47;
>>>>>>>>> _53 = xmax_17 == xmax_37;
>>>>>>>>> _54 = ~_53;
>>>>>>>>> _55 = xmax_17 == xmax_27;
>>>>>>>>> _56 = _54 & _55;
>>>>>>>>> _57 = _48 | _56;
>>>>>>>>> xmax_edge_19 = xmax_edge_39 + 1;
>>>>>>>>> goto <bb 11>;
>>>>>>>>>
>>>>>>>>> It is evident that we can not put phi predication at the block
>>>>>>>>> beginning but need to put it after predicate computations.
>>>>>>>>> Note also that if there are no critical edges for phi arguments
>>>>>>>>> insertion point will be "after labels" Note also that phi result can
>>>>>>>>> have use in this block too, so we can't put predication code to the
>>>>>>>>> block end.
>>>>>>>>
>>>>>>>> So the issue is that predicate insertion for edge predicates does
>>>>>>>> not happen on the edge but somewhere else (generally impossible
>>>>>>>> for critical edges unless you split them).
>>>>>>>>
>>>>>>>> I think I've told you before that I prefer simple solutions to such 
>>>>>>>> issues,
>>>>>>>> like splitting the edge!  Certainly not involving a function walking
>>>>>>>> GENERIC expressions.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Let me know if you still have any questions.
>>>>>>>>>
>>>>>>>>> Best regards.
>>>>>>>>> Yuri.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 2014-11-28 15:43 GMT+03:00 Richard Biener 
>>>>>>>>> <richard.guent...@gmail.com>:
>>>>>>>>>> On Wed, Nov 12, 2014 at 2:35 PM, Yuri Rumyantsev 
>>>>>>>>>> <ysrum...@gmail.com> wrote:
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> Here is the second patch related to extended predication.
>>>>>>>>>>> Few comments which explain a main goal of design.
>>>>>>>>>>>
>>>>>>>>>>> 1. I don't want to insert any critical edge splitting since it may
>>>>>>>>>>> lead to less efficient binaries.
>>>>>>>>>>> 2. One special case of extended PHI node predication was introduced
>>>>>>>>>>> when #arguments is more than 2 but only two arguments are different
>>>>>>>>>>> and one argument has the only occurrence. For such PHI conditional
>>>>>>>>>>> scalar reduction is applied.
>>>>>>>>>>> This is correspondent to the following statement:
>>>>>>>>>>>     if (q1 && q2 && q3) var++
>>>>>>>>>>>  New function phi_has_two_different_args was introduced to detect 
>>>>>>>>>>> such phi.
>>>>>>>>>>> 3. Original algorithm for PHI predication used assumption that at
>>>>>>>>>>> least one incoming edge for blocks containing PHI is not critical - 
>>>>>>>>>>> it
>>>>>>>>>>> guarantees that all computations related to predicate of normal edge
>>>>>>>>>>> are already inserted above this block and
>>>>>>>>>>> code related to PHI predication can be inserted at the beginning of
>>>>>>>>>>> block. But this is not true for critical edges for which predicate
>>>>>>>>>>> computations are  in the block where code for phi predication must 
>>>>>>>>>>> be
>>>>>>>>>>> inserted. So new function find_insertion_point is introduced which 
>>>>>>>>>>> is
>>>>>>>>>>> simply found out the last statement in block defining predicates
>>>>>>>>>>> correspondent to all incoming edges and insert phi predication code
>>>>>>>>>>> after it (with some minor exceptions).
>>>>>>>>>>
>>>>>>>>>> Unfortunately the patch doesn't apply for me - I get
>>>>>>>>>>
>>>>>>>>>> patch: **** malformed patch at line 505: @@ -1720,6 +2075,8 @@
>>>>>>>>>> predicate_all_scalar_phis (struct loop *loop)
>>>>>>>>>>
>>>>>>>>>> a few remarks nevertheless.  I don't see how we need both
>>>>>>>>>> predicate_extended_scalar_phi and predicate_arbitrary_scalar_phi.
>>>>>>>>>> Couldn't we simply sort an array of (edge, value) pairs after value
>>>>>>>>>> and handle equal values specially in predicate_extended_scalar_phi?
>>>>>>>>>> That would even make PHI <a, a, b, c, c> more optimal.
>>>>>>>>>>
>>>>>>>>>> I don't understand the need for find_insertion_point.  All SSA names
>>>>>>>>>> required for the predicates are defined upward - and the complex CFG
>>>>>>>>>> is squashed to a single basic-block, thus the defs will dominate the
>>>>>>>>>> inserted code if you insert after labels just like for the other 
>>>>>>>>>> case.
>>>>>>>>>> Or what am I missing?  ("flattening" of the basic-blocks of course 
>>>>>>>>>> needs
>>>>>>>>>> to happen in dominator order - but I guess that happens already?)
>>>>>>>>>>
>>>>>>>>>> I'd like the extended PHI handling to be enablable by a flag even
>>>>>>>>>> for !force-vectorization - I've seen cases with 3 PHI args multiple
>>>>>>>>>> times that would have been nice to vectorize.  I suggest to
>>>>>>>>>> add -ftree-loop-if-convert-aggressive for this.  We can do this as
>>>>>>>>>> followup, but please rename the local flag_force_vectorize flag
>>>>>>>>>> to something less looking like a flag, like simply 'aggressive'.
>>>>>>>>>>
>>>>>>>>>> Otherwise patch 2 looks ok to me.
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> ChangeLog:
>>>>>>>>>>>
>>>>>>>>>>> 2014-10-24  Yuri Rumyantsev  <ysrum...@gmail.com>
>>>>>>>>>>>
>>>>>>>>>>> * tree-if-conv.c (ifcvt_can_use_mask_load_store): Use
>>>>>>>>>>> FLAG_FORCE_VECTORIZE instead of loop flag.
>>>>>>>>>>> (if_convertible_bb_p): Allow bb has more than 2 predecessors if
>>>>>>>>>>> FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>> (if_convertible_bb_p): Delete check that bb has at least one
>>>>>>>>>>> non-critical incoming edge.
>>>>>>>>>>> (phi_has_two_different_args): New function.
>>>>>>>>>>> (is_cond_scalar_reduction): Add argument EXTENDED to choose access
>>>>>>>>>>> to phi arguments. Invoke phi_has_two_different_args to get phi
>>>>>>>>>>> arguments if EXTENDED is true. Change check that block
>>>>>>>>>>> containing reduction statement candidate is predecessor
>>>>>>>>>>> of phi-block since phi may have more than two arguments.
>>>>>>>>>>> (convert_scalar_cond_reduction): Add argument BEFORE to insert
>>>>>>>>>>> statement before/after gsi point.
>>>>>>>>>>> (predicate_scalar_phi): Add argument false (which means non-extended
>>>>>>>>>>> predication) to call of is_cond_scalar_reduction. Add argument
>>>>>>>>>>> true (which correspondent to argument BEFORE) to call of
>>>>>>>>>>> convert_scalar_cond_reduction.
>>>>>>>>>>> (get_predicate_for_edge): New function.
>>>>>>>>>>> (predicate_arbitrary_scalar_phi): New function.
>>>>>>>>>>> (predicate_extended_scalar_phi): New function.
>>>>>>>>>>> (find_insertion_point): New function.
>>>>>>>>>>> (predicate_all_scalar_phis): Add two boolean variables EXTENDED and
>>>>>>>>>>> BEFORE. Initialize EXTENDED to true if BB containing phi has more
>>>>>>>>>>> than 2 predecessors or both incoming edges are critical. Invoke
>>>>>>>>>>> find_phi_replacement_condition and predicate_scalar_phi or
>>>>>>>>>>> find_insertion_point and predicate_extended_scalar_phi depending on
>>>>>>>>>>> EXTENDED value.
>>>>>>>>>>> (insert_gimplified_predicates): Add check that non-predicated block
>>>>>>>>>>> may have statements to insert. Insert predicate of BB just after 
>>>>>>>>>>> label
>>>>>>>>>>> if FLAG_FORCE_VECTORIZE is true.
>>>>>>>>>>> (tree_if_conversion): Add initialization of FLAG_FORCE_VECTORIZE 
>>>>>>>>>>> which
>>>>>>>>>>> is copy of inner or outer loop field force_vectorize.

Reply via email to