Re: RFC [1/3] divmod transform v2

Prathamesh Kulkarni Fri, 21 Oct 2016 03:35:54 -0700

On 20 October 2016 at 15:02, Richard Biener <rguent...@suse.de> wrote:
> On Wed, 19 Oct 2016, Jeff Law wrote:
>
>> On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote:
>> > Hi,
>> > After approval from Bernd Schmidt, I committed the patch to remove
>> > optab functions for
>> > sdivmod_optab and udivmod_optab in optabs.def, which removes the block
>> > for divmod patch.
>> >
>> > This patch is mostly the same as previous one, except it drops
>> > targeting __udivmoddi4() because
>> > it gave undefined reference link error for calling __udivmoddi4() on
>> > aarch64-linux-gnu.
>> > It appears aarch64 has hardware insn for DImode div, so __udivmoddi4()
>> > isn't needed for the target
>> > (it was a bug in my patch that called __udivmoddi4() even though
>> > aarch64 supported hardware div).
>> >
>> > However this makes me wonder if it's guaranteed that __udivmoddi4()
>> > will be available for a target if it doesn't have hardware div and
>> > divmod insn and doesn't have target-specific libfunc for
>> > DImode divmod ? To be conservative, the attached patch doesn't
>> > generate call to __udivmoddi4.
>> >
>> > Passes bootstrap+test on x86_64-unknown-linux.
>> > Cross-tested on arm*-*-*, aarch64*-*-*.
>> > Verified that there are no regressions with SPEC2006 on
>> > x86_64-unknown-linux-gnu.
>> > OK to commit ?
>> >
>> > Thanks,
>> > Prathamesh
>> >
>> >
>> > divmod-v2-3-main.txt
>> >
>> >
>> > 2016-10-15  Prathamesh Kulkarni  <prathamesh.kulka...@linaro.org>
>> >         Kugan Vivekanandarajah  <kug...@linaro.org>
>> >         Jim Wilson  <jim.wil...@linaro.org>
>> >
>> >         * target.def: New hook expand_divmod_libfunc.
>> >         * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC
>> >         * doc/tm.texi: Regenerate.
>> >         * internal-fn.def: Add new entry for DIVMOD ifn.
>> >         * internal-fn.c (expand_DIVMOD): New.
>> >         * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h,
>> >         targhooks.h.
>> >         (widen_mul_stats): Add new field divmod_calls_inserted.
>> >         (target_supports_divmod_p): New.
>> >         (divmod_candidate_p): Likewise.
>> >         (convert_to_divmod): Likewise.
>> >         (pass_optimize_widening_mul::execute): Call
>> >         calculate_dominance_info(), renumber_gimple_stmt_uids() at
>> >         beginning of function. Call convert_to_divmod()
>> >         and record stats for divmod.
>> Starting with some high level design comments.  If these conflict with
>> comments from others, let me know and we'll work through the issues.
>>
>> I don't really like introducing code conditional on the target capabilities
>> this early in the gimple optimization pipeline.
>
> It's basically done right before RTL expansion
> (pass_optimize_widening_mul).
>
>> Would it be possible to always do the transformation to divmod in the gimple
>> optimizers, regardless of the target capabilities.  Then in the gimple->RTL
>> expanders make a final decision about divmod insn, libcall, or using div/mod
>> insns?
>
> The issue is that it hoists one or both the division or the modulo and
> if we don't do the transform we'd want to undo that code motion.
>
>> That would move all the target dependencies out of the gimple optimizers and
>> into the gimple->rtl expansion phase, which is the preferred place to start
>> introducing this kind of target dependency.
>>
>> With that background, I'm going to focus more on the identification of divmod
>> opportunities than the expansion bits.
>>
>>
>> >
>> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
>> > index a4a8e49..866c368 100644
>> > --- a/gcc/doc/tm.texi
>> > +++ b/gcc/doc/tm.texi
>> > @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 targets,
>> > please refer to
>> >  the hook implementation for how different fusion types are supported.
>> >  @end deftypefn
>> >
>> > +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx
>> > @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx
>> > *@var{quot}, rtx *@var{rem})
>> > +Define this hook for enabling divmod transform if the port does not have
>> > +hardware divmod insn but defines target-specific divmod libfuncs.
>> > +@end deftypefn
>> > +
>> >  @node Sections
>> >  @section Dividing the Output into Sections (Texts, Data, @dots{})
>> >  @c the above section title is WAY too long.  maybe cut the part between
>> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
>> > index 265f1be..c4c387b 100644
>> > --- a/gcc/doc/tm.texi.in
>> > +++ b/gcc/doc/tm.texi.in
>> > @@ -4890,6 +4890,8 @@ them: try the first ones in this list first.
>> >
>> >  @hook TARGET_SCHED_FUSION_PRIORITY
>> >
>> > +@hook TARGET_EXPAND_DIVMOD_LIBFUNC
>> > +
>> >  @node Sections
>> >  @section Dividing the Output into Sections (Texts, Data, @dots{})
>> >  @c the above section title is WAY too long.  maybe cut the part between
>> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>> > index 0b32d5f..42c6973 100644
>> > --- a/gcc/internal-fn.c
>> > +++ b/gcc/internal-fn.c
>> > @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall
>> > *call)
>> >    expand_ifn_atomic_compare_exchange (call);
>> >  }
>> >
>> > +/* Expand DIVMOD() using:
>> In general, we do not use () when referring to objects in comments.
>>
>> > + a) optab handler for udivmod/sdivmod if it is available.
>> > + b) If optab_handler doesn't exist, generate call to
>> > +    target-specific divmod libfunc.  */
>> > +
>> > +static void
>> > +expand_DIVMOD (internal_fn, gcall *call_stmt)
>> In general, don't use upper case for function names.  Lower case please.  But
>> I believe we should be pushing all the expansion stuff into the gimple->rtl
>> expansion code, so I haven't looked at this closely yet on the expectation
>> it'll likely move.
>>
>>
>>
>> > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c
>> > index 0cea1a8..4f7bdb2 100644
>> > --- a/gcc/tree-ssa-math-opts.c
>> > +++ b/gcc/tree-ssa-math-opts.c
>> > @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi,
>> > gimple *stmt,
>> >    return true;
>> >  }
>> >
>> > +/* Return true if target has support for divmod.  */
>> > +
>> > +static bool
>> > +target_supports_divmod_p (optab divmod_optab, optab div_optab, 
>> > machine_mode
>> > mode)
>> So this is the kind of code I expect to avoid in the gimple optimizers.
>> Instead the decision about whether to use a divmod insn, divmod libfunc or 
>> div
>> + synthesized mod, or separate div and mod insns should happen at gimple->rtl
>> expansion time.
>>
>>
>> > +
>> > +/* This function looks for:
>> > +   t1 = a TRUNC_DIV_EXPR b;
>> > +   t2 = a TRUNC_MOD_EXPR b;
>> > +   and transforms it to the following sequence:
>> > +   complex_tmp = DIVMOD (a, b);
>> > +   t1 = REALPART_EXPR(a);
>> > +   t2 = IMAGPART_EXPR(b);
>> > +   For conditions enabling the transform see divmod_candidate_p().
>> > +
>> > +   The pass works in two phases:
>> > +   1) Walk through all immediate uses of stmt's operand and find a
>> > +      TRUNC_DIV_EXPR with matching operands and if such a stmt is found 
>> > add
>> > +      it to stmts vector.
>> > +   2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block
>> > that
>> > +      dominates other div/mod stmts with same operands) and update entries
>> > in
>> > +      stmts vector to use return value of DIMOVD (REALEXPR_PART for div,
>> > +      IMAGPART_EXPR for mod).  */
>> > +
>> > +static bool
>> > +convert_to_divmod (gassign *stmt)
>> > +{
>> > +  if (!divmod_candidate_p (stmt))
>> > +    return false;
>> > +
>> > +  tree op1 = gimple_assign_rhs1 (stmt);
>> > +  tree op2 = gimple_assign_rhs2 (stmt);
>> > +
>> > +  imm_use_iterator use_iter;
>> > +  gimple *use_stmt;
>> > +  auto_vec<gimple *> stmts;
>> > +
>> > +  gimple *top_stmt = stmt;
>> > +  basic_block top_bb = gimple_bb (stmt);
>> > +
>> > +  /* Try to set top_stmt to "topmost" stmt
>> > +     with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt.
>> > */
>> > +
>> > +  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1)
>> > +    {
>> > +      if (is_gimple_assign (use_stmt)
>> > +     && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR
>> > +         || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR)
>> > +     && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0)
>> > +     && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0))
>> So why check for TRUNC_MOD_EXPR here?  ISTM that stmt is always going to be
>> TRUNC_MOD_EXPR and thus you're only interested in looking for a matching
>> TRUNC_DIV_EXPR statement.
>>
>> The only way I could see TRUNC_MOD_EXPR being useful here would be if there 
>> is
>> a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some other
>> optimizer hasn't done its job.  Have you seen this to be useful in practice?
>
> Note that I've reviewed this parts already and we restructured things
> in this way, requiring to look for TRUNC_MOD_EXPR to properly handle
> finding a dominating trunc _or_ div and interatively doing so
> correctly if we have more than one pair.
>
>> > +   {
>> > +     if (stmt_can_throw_internal (use_stmt))
>> > +       continue;
>> > +
>> > +     basic_block bb = gimple_bb (use_stmt);
>> > +
>> > +     if (bb == top_bb)
>> > +       {
>> > +         if (gimple_uid (use_stmt) < gimple_uid (top_stmt))
>> > +           top_stmt = use_stmt;
>> > +       }
>> > +     else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb))
>> > +       {
>> > +         top_bb = bb;
>> > +         top_stmt = use_stmt;
>> > +       }
>> Do you want to break out of the immediate use loop once you've found a
>> TRUNC_DIV_EXPR statement with matching operands?
>>
>> I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the
>> TRUNC_MOD_EXPR, but that would mean that we've got redundant TRUNC_DIV_EXPR
>> statements in the IL, which presumably doesn't happen if the rest of the
>> optimizers are doing their job.
>>
>>
>>
>> > +   }
>> > +    }
>> > +
>> > +  if (top_stmt == stmt && stmt_can_throw_internal (top_stmt))
>> > +    return false;
>> Don't you just want
>> if (stmt_can_throw_internal (stmt))
>>   return false;
>>
>> Before the loop over the immediate uses, thus avoiding the loop if we've got 
>> a
>> TRUNC_MOD_EXPR that we can't optimize?
>>
>>
>>
>> > +
>> > +  tree top_op1 = gimple_assign_rhs1 (top_stmt);
>> > +  tree top_op2 = gimple_assign_rhs2 (top_stmt);
>> > +
>> > +  stmts.safe_push (top_stmt);
>> > +  bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR);
>> > +
>> > +  /* Ensure that gimple_bb (use_stmt) is dominated by top_bb.  */
>> ?!?  It's not clear why you'd need to do another immediate use loop here.
>>
>> ISTM at this point you've found two statements, one a TRUNC_DIV the other a
>> TRUNC_MOD with the same operands.  Given that, ISTM you just need to check 
>> the
>> dominance relationship of the blocks containing those statements.
>>
>> It really feels like you're going out of your way to handle cases where we
>> have redundant expressions in the IL.  Maybe I'm wrong. BUt if I'm right, I'd
>> rather see that time spent optimizing away those redundant expressions in
>> FRE/PRE/DOM.
>>
>> > +
>> > +  /* Create libcall to internal fn DIVMOD:
>> > +     divmod_tmp = DIVMOD (op1, op2).  */
>> > +
>> > +  gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2);
>> > +  tree res = make_temp_ssa_name (
>> > +           build_complex_type (TREE_TYPE (op1)),
>> > +           call_stmt, "divmod_tmp");
>> Please review the the GCC formatting guidelines.  This looks wrong.
>>
>> tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)),
>>                                call_stmt, "divmod_tmp");
>>
>> Seems like the right formatting to me.
>>
>> > +
>> > +  /* Update all statements in stmts.
>> > +     if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs =
>> > REALPART_EXPR<divmod_tmp>
>> > +     if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs =
>> > IMAGPART_EXPR<divmod_tmp>.  */
>> I'd just emit a copy from RES to the appropriate lhs operand just after the
>> divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR
>> statements.
>
> That sounds like a good idea.
Um sorry, not sure if I understood this part.
For eg:
t1 = x / y;
t2 = x % y;


do we want to transform it to:
complex_tmp = DIVMOD (x, y);
div_tmp = REALPART_EXPR<complex_tmp>
mod_tmp = IMAGPART_EXPR<complex_tmp>

and then replace each use of lhs (t1, t2 etc.) by either div_tmp
or mod_tmp (depending if the stmt was trunc_div_expr or trunc_mod_expr) ?

Thanks,
Prathamesh
>
>> Copy propagation should clean that up just fine and it simplifies your
>> implementation.
>
> There should be no copy to propagate that way.
>
>> > +
>> > +  bool cfg_changed = false;
>> > +  for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i)
>> > +    {
>> > +      tree new_rhs;
>> > +
>> > +      switch (gimple_assign_rhs_code (use_stmt))
>> > +   {
>> > +     case TRUNC_DIV_EXPR:
>> > +       new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res);
>> > +       break;
>> > +
>> > +     case TRUNC_MOD_EXPR:
>> > +       new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res);
>> > +       break;
>> > +
>> > +     default:
>> > +       gcc_unreachable ();
>> > +   }
>> > +
>> > +      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
>> > +      gimple_assign_set_rhs_from_tree (&gsi, new_rhs);
>> > +      update_stmt (use_stmt);
>> > +
>> > +      if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt))
>> > +   cfg_changed = true;
>> Does this ever happen given how you filter out throwing statements earlier?
>
> We filter out throwing stmts that are not "top".  The top one we replace
> with the divmod call and we can preserve EH for that (and thus handle
> the case where the original div/mod at this place may throw).
>
>>
>> > @@ -3870,6 +4098,10 @@ pass_optimize_widening_mul::execute (function *fun)
>> >                 match_uaddsub_overflow (&gsi, stmt, code);
>> >               break;
>> >
>> > +           case TRUNC_MOD_EXPR:
>> > +             cfg_changed = convert_to_divmod (as_a<gassign *> (stmt));
>> > +             break;
>> > +
>> >             default:;
>> >             }
>> >         }
>> If I'm reading this correctly, ISTM that you only search for a divmod pair if
>> you first find the mod insn.  That's probably reasonable.  We have to find
>> both to have an optimization opportunity and I suspect we have more div than
>> mod insns, so you're presumably minimizing the search space with this
>> approach.  Can you confirm that was your intention here, or were you looking
>> to do something else?
>
> Yes, that was the idea.
>
> Richard.

Re: RFC [1/3] divmod transform v2

Reply via email to