https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77498

--- Comment #5 from amker at gcc dot gnu.org ---
(In reply to Richard Biener from comment #4)
> CCing Bin, he was looking into PRE/predcom as well AFAIR.  predictive
> commoning here performs unrolling to be able to avoid some loop-carried
> dependencies
> while PRE has the larger distances covered by for example
> 
> <bb 6> [85.00%]:
> # prephitmp_656 = PHI <_125(6), pretmp_655(5)>
> # prephitmp_674 = PHI <prephitmp_656(6), pretmp_673(5)>
> 
> this kind of loop carried PHIs should be a hint for a tree level unroller
> to perform unrolling (just in case they literally appear in source for
> example).
> OTOH if unrolling can solve the RA problem then the it must be solvable not
> unrolled as well?  Note that with predcom we end up with 11 pointer IVs while
> with PRE we have just one (but use 20 others from the outer loop...) -
> possibly
> the versioning predcom performs makes IVO not do any outer loop IVO.  Using
> -fschedule-insns -fsched-pressure helps somewhat but not much.
> 
> So it looks like a RA related issue and IVO is as much relevant as PRE doing
> predictive commoning at -O2 (and at -O3 doing predcoms job but worse in this
> case).
> 
> During PHI translation we can tame this down to a level pre this rev. again,
> for example with the following.  But ideally we'd compute antic and do
> insertion
> for the full dataflow problem and only apply this "cost modeling" during
> elimination to not lose secondary level transforms that are profitable
> (also below we do not know whether we need to insert a PHI for the value in
> the end).
> 
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> --- gcc/tree-ssa-pre.c  (revision 244484)
> +++ gcc/tree-ssa-pre.c  (working copy)
> @@ -1465,16 +1465,16 @@ phi_translate_1 (pre_expr expr, bitmap_s
>               {
>                 /* For non-CONSTANTs we have to make sure we can eventually
>                    insert the expression.  Which means we need to have a
> -                  leader for it.  */
> -               if (constant->kind != CONSTANT)
> +                  leader for it.  Avoid doing this across backedges though.
> */
> +               if (constant->kind == CONSTANT)
> +                 return constant;
> +               else if (! dominated_by_p (CDI_DOMINATORS, pred, phiblock))
>                   {
>                     unsigned value_id = get_expr_value_id (constant);
>                     constant = find_leader_in_sets (value_id, set1, set2);
>                     if (constant)
>                       return constant;
>                   }
> -               else
> -                 return constant;
>               }
>  
>             tree result = vn_nary_op_lookup_pieces (newnary->length,
> 
> 
> But as said, a whole different question is whether we want PRE to add IVs at
> all
> (but we do have some testcases requesting exactly that, for example
> gcc.dg/tree-ssa/pr71347.c or ssa-pre-23.c requesting store-motion w/o
> actually sinking the store).
> 
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> --- gcc/tree-ssa-pre.c  (revision 244484)
> +++ gcc/tree-ssa-pre.c  (working copy)
> @@ -4290,6 +4290,31 @@ eliminate_dom_walker::before_dom_childre
>                                                VN_INFO_RANGE_INFO (lhs));
>             }
>  
> +         if (sprime
> +             && TREE_CODE (sprime) == SSA_NAME
> +             && do_pre
> +             && loop_outer (b->loop_father)
> +             && has_zero_uses (sprime)
> +             && bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (sprime)))
> +           {
> +             gimple *def_stmt = SSA_NAME_DEF_STMT (sprime);
> +             basic_block def_bb = gimple_bb (def_stmt);
> +             if (gimple_code (def_stmt) == GIMPLE_PHI
> +                 && def_bb->loop_father->header == def_bb)
> +               {
> +                 bool ok = true;
> +                 edge_iterator ei;
> +                 edge e;
> +                 FOR_EACH_EDGE (e, ei, def_bb->preds)
> +                   if (dominated_by_p (CDI_DOMINATORS, e->src, e->dest)
> +                       && TREE_CODE (PHI_ARG_DEF_FROM_EDGE (def_stmt, e))
> == SSA_NAME)
> +                     ok = false;
> +                 /* Don't keep sprime available.  */
> +                 if (!ok)
> +                   sprime = NULL_TREE;
> +               }
> +           }
> +
>           /* Inhibit the use of an inserted PHI on a loop header when
>              the address of the memory reference is a simple induction
>              variable.  In other cases the vectorizer won't do anything

I am trying to model register pressure and use that information to direct
predcom.  So far it detects only one case 436.cactusADM but does improve a lot.
 Though it's hard to model cost of pre_expr, but for loop carries ones, we may
be able to simply control the number using pressure information too.  According
to the report, this is a register pressure issue caused by PRE, I will check if
that can help.  Thanks.

Reply via email to