On 20 October 2016 at 15:02, Richard Biener <rguent...@suse.de> wrote: > On Wed, 19 Oct 2016, Jeff Law wrote: > >> On 10/15/2016 11:59 PM, Prathamesh Kulkarni wrote: >> > Hi, >> > After approval from Bernd Schmidt, I committed the patch to remove >> > optab functions for >> > sdivmod_optab and udivmod_optab in optabs.def, which removes the block >> > for divmod patch. >> > >> > This patch is mostly the same as previous one, except it drops >> > targeting __udivmoddi4() because >> > it gave undefined reference link error for calling __udivmoddi4() on >> > aarch64-linux-gnu. >> > It appears aarch64 has hardware insn for DImode div, so __udivmoddi4() >> > isn't needed for the target >> > (it was a bug in my patch that called __udivmoddi4() even though >> > aarch64 supported hardware div). >> > >> > However this makes me wonder if it's guaranteed that __udivmoddi4() >> > will be available for a target if it doesn't have hardware div and >> > divmod insn and doesn't have target-specific libfunc for >> > DImode divmod ? To be conservative, the attached patch doesn't >> > generate call to __udivmoddi4. >> > >> > Passes bootstrap+test on x86_64-unknown-linux. >> > Cross-tested on arm*-*-*, aarch64*-*-*. >> > Verified that there are no regressions with SPEC2006 on >> > x86_64-unknown-linux-gnu. >> > OK to commit ? >> > >> > Thanks, >> > Prathamesh >> > >> > >> > divmod-v2-3-main.txt >> > >> > >> > 2016-10-15 Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> >> > Kugan Vivekanandarajah <kug...@linaro.org> >> > Jim Wilson <jim.wil...@linaro.org> >> > >> > * target.def: New hook expand_divmod_libfunc. >> > * doc/tm.texi.in: Add hook for TARGET_EXPAND_DIVMOD_LIBFUNC >> > * doc/tm.texi: Regenerate. >> > * internal-fn.def: Add new entry for DIVMOD ifn. >> > * internal-fn.c (expand_DIVMOD): New. >> > * tree-ssa-math-opts.c: Include optabs-libfuncs.h, tree-eh.h, >> > targhooks.h. >> > (widen_mul_stats): Add new field divmod_calls_inserted. >> > (target_supports_divmod_p): New. >> > (divmod_candidate_p): Likewise. >> > (convert_to_divmod): Likewise. >> > (pass_optimize_widening_mul::execute): Call >> > calculate_dominance_info(), renumber_gimple_stmt_uids() at >> > beginning of function. Call convert_to_divmod() >> > and record stats for divmod. >> Starting with some high level design comments. If these conflict with >> comments from others, let me know and we'll work through the issues. >> >> I don't really like introducing code conditional on the target capabilities >> this early in the gimple optimization pipeline. > > It's basically done right before RTL expansion > (pass_optimize_widening_mul). > >> Would it be possible to always do the transformation to divmod in the gimple >> optimizers, regardless of the target capabilities. Then in the gimple->RTL >> expanders make a final decision about divmod insn, libcall, or using div/mod >> insns? > > The issue is that it hoists one or both the division or the modulo and > if we don't do the transform we'd want to undo that code motion. > >> That would move all the target dependencies out of the gimple optimizers and >> into the gimple->rtl expansion phase, which is the preferred place to start >> introducing this kind of target dependency. >> >> With that background, I'm going to focus more on the identification of divmod >> opportunities than the expansion bits. >> >> >> > >> > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi >> > index a4a8e49..866c368 100644 >> > --- a/gcc/doc/tm.texi >> > +++ b/gcc/doc/tm.texi >> > @@ -7078,6 +7078,11 @@ This is firstly introduced on ARM/AArch64 targets, >> > please refer to >> > the hook implementation for how different fusion types are supported. >> > @end deftypefn >> > >> > +@deftypefn {Target Hook} void TARGET_EXPAND_DIVMOD_LIBFUNC (rtx >> > @var{libfunc}, machine_mode @var{mode}, rtx @var{op0}, rtx @var{op1}, rtx >> > *@var{quot}, rtx *@var{rem}) >> > +Define this hook for enabling divmod transform if the port does not have >> > +hardware divmod insn but defines target-specific divmod libfuncs. >> > +@end deftypefn >> > + >> > @node Sections >> > @section Dividing the Output into Sections (Texts, Data, @dots{}) >> > @c the above section title is WAY too long. maybe cut the part between >> > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in >> > index 265f1be..c4c387b 100644 >> > --- a/gcc/doc/tm.texi.in >> > +++ b/gcc/doc/tm.texi.in >> > @@ -4890,6 +4890,8 @@ them: try the first ones in this list first. >> > >> > @hook TARGET_SCHED_FUSION_PRIORITY >> > >> > +@hook TARGET_EXPAND_DIVMOD_LIBFUNC >> > + >> > @node Sections >> > @section Dividing the Output into Sections (Texts, Data, @dots{}) >> > @c the above section title is WAY too long. maybe cut the part between >> > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c >> > index 0b32d5f..42c6973 100644 >> > --- a/gcc/internal-fn.c >> > +++ b/gcc/internal-fn.c >> > @@ -2207,6 +2207,53 @@ expand_ATOMIC_COMPARE_EXCHANGE (internal_fn, gcall >> > *call) >> > expand_ifn_atomic_compare_exchange (call); >> > } >> > >> > +/* Expand DIVMOD() using: >> In general, we do not use () when referring to objects in comments. >> >> > + a) optab handler for udivmod/sdivmod if it is available. >> > + b) If optab_handler doesn't exist, generate call to >> > + target-specific divmod libfunc. */ >> > + >> > +static void >> > +expand_DIVMOD (internal_fn, gcall *call_stmt) >> In general, don't use upper case for function names. Lower case please. But >> I believe we should be pushing all the expansion stuff into the gimple->rtl >> expansion code, so I haven't looked at this closely yet on the expectation >> it'll likely move. >> >> >> >> > diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c >> > index 0cea1a8..4f7bdb2 100644 >> > --- a/gcc/tree-ssa-math-opts.c >> > +++ b/gcc/tree-ssa-math-opts.c >> > @@ -3793,6 +3799,226 @@ match_uaddsub_overflow (gimple_stmt_iterator *gsi, >> > gimple *stmt, >> > return true; >> > } >> > >> > +/* Return true if target has support for divmod. */ >> > + >> > +static bool >> > +target_supports_divmod_p (optab divmod_optab, optab div_optab, >> > machine_mode >> > mode) >> So this is the kind of code I expect to avoid in the gimple optimizers. >> Instead the decision about whether to use a divmod insn, divmod libfunc or >> div >> + synthesized mod, or separate div and mod insns should happen at gimple->rtl >> expansion time. >> >> >> > + >> > +/* This function looks for: >> > + t1 = a TRUNC_DIV_EXPR b; >> > + t2 = a TRUNC_MOD_EXPR b; >> > + and transforms it to the following sequence: >> > + complex_tmp = DIVMOD (a, b); >> > + t1 = REALPART_EXPR(a); >> > + t2 = IMAGPART_EXPR(b); >> > + For conditions enabling the transform see divmod_candidate_p(). >> > + >> > + The pass works in two phases: >> > + 1) Walk through all immediate uses of stmt's operand and find a >> > + TRUNC_DIV_EXPR with matching operands and if such a stmt is found >> > add >> > + it to stmts vector. >> > + 2) Insert DIVMOD call before first div/mod stmt in top_bb (basic block >> > that >> > + dominates other div/mod stmts with same operands) and update entries >> > in >> > + stmts vector to use return value of DIMOVD (REALEXPR_PART for div, >> > + IMAGPART_EXPR for mod). */ >> > + >> > +static bool >> > +convert_to_divmod (gassign *stmt) >> > +{ >> > + if (!divmod_candidate_p (stmt)) >> > + return false; >> > + >> > + tree op1 = gimple_assign_rhs1 (stmt); >> > + tree op2 = gimple_assign_rhs2 (stmt); >> > + >> > + imm_use_iterator use_iter; >> > + gimple *use_stmt; >> > + auto_vec<gimple *> stmts; >> > + >> > + gimple *top_stmt = stmt; >> > + basic_block top_bb = gimple_bb (stmt); >> > + >> > + /* Try to set top_stmt to "topmost" stmt >> > + with code TRUNC_DIV_EXPR/TRUNC_MOD_EXPR having same operands as stmt. >> > */ >> > + >> > + FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, op1) >> > + { >> > + if (is_gimple_assign (use_stmt) >> > + && (gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR >> > + || gimple_assign_rhs_code (use_stmt) == TRUNC_MOD_EXPR) >> > + && operand_equal_p (op1, gimple_assign_rhs1 (use_stmt), 0) >> > + && operand_equal_p (op2, gimple_assign_rhs2 (use_stmt), 0)) >> So why check for TRUNC_MOD_EXPR here? ISTM that stmt is always going to be >> TRUNC_MOD_EXPR and thus you're only interested in looking for a matching >> TRUNC_DIV_EXPR statement. >> >> The only way I could see TRUNC_MOD_EXPR being useful here would be if there >> is >> a redundant TRUNC_MOD_EXPR in the IL, which would be a sign that some other >> optimizer hasn't done its job. Have you seen this to be useful in practice? > > Note that I've reviewed this parts already and we restructured things > in this way, requiring to look for TRUNC_MOD_EXPR to properly handle > finding a dominating trunc _or_ div and interatively doing so > correctly if we have more than one pair. > >> > + { >> > + if (stmt_can_throw_internal (use_stmt)) >> > + continue; >> > + >> > + basic_block bb = gimple_bb (use_stmt); >> > + >> > + if (bb == top_bb) >> > + { >> > + if (gimple_uid (use_stmt) < gimple_uid (top_stmt)) >> > + top_stmt = use_stmt; >> > + } >> > + else if (dominated_by_p (CDI_DOMINATORS, top_bb, bb)) >> > + { >> > + top_bb = bb; >> > + top_stmt = use_stmt; >> > + } >> Do you want to break out of the immediate use loop once you've found a >> TRUNC_DIV_EXPR statement with matching operands? >> >> I could perhaps see value in finding the closest TRUNC_DIV_EXPR to the >> TRUNC_MOD_EXPR, but that would mean that we've got redundant TRUNC_DIV_EXPR >> statements in the IL, which presumably doesn't happen if the rest of the >> optimizers are doing their job. >> >> >> >> > + } >> > + } >> > + >> > + if (top_stmt == stmt && stmt_can_throw_internal (top_stmt)) >> > + return false; >> Don't you just want >> if (stmt_can_throw_internal (stmt)) >> return false; >> >> Before the loop over the immediate uses, thus avoiding the loop if we've got >> a >> TRUNC_MOD_EXPR that we can't optimize? >> >> >> >> > + >> > + tree top_op1 = gimple_assign_rhs1 (top_stmt); >> > + tree top_op2 = gimple_assign_rhs2 (top_stmt); >> > + >> > + stmts.safe_push (top_stmt); >> > + bool div_seen = (gimple_assign_rhs_code (top_stmt) == TRUNC_DIV_EXPR); >> > + >> > + /* Ensure that gimple_bb (use_stmt) is dominated by top_bb. */ >> ?!? It's not clear why you'd need to do another immediate use loop here. >> >> ISTM at this point you've found two statements, one a TRUNC_DIV the other a >> TRUNC_MOD with the same operands. Given that, ISTM you just need to check >> the >> dominance relationship of the blocks containing those statements. >> >> It really feels like you're going out of your way to handle cases where we >> have redundant expressions in the IL. Maybe I'm wrong. BUt if I'm right, I'd >> rather see that time spent optimizing away those redundant expressions in >> FRE/PRE/DOM. >> >> > + >> > + /* Create libcall to internal fn DIVMOD: >> > + divmod_tmp = DIVMOD (op1, op2). */ >> > + >> > + gcall *call_stmt = gimple_build_call_internal (IFN_DIVMOD, 2, op1, op2); >> > + tree res = make_temp_ssa_name ( >> > + build_complex_type (TREE_TYPE (op1)), >> > + call_stmt, "divmod_tmp"); >> Please review the the GCC formatting guidelines. This looks wrong. >> >> tree res = make_temp_ssa_name (build_complex_type (TREE_TYPE (op1)), >> call_stmt, "divmod_tmp"); >> >> Seems like the right formatting to me. >> >> > + >> > + /* Update all statements in stmts. >> > + if stmt is lhs = op1 TRUNC_DIV_EXPR op2, change to lhs = >> > REALPART_EXPR<divmod_tmp> >> > + if stmt is lhs = op1 TRUNC_MOD_EXPR op2, change to lhs = >> > IMAGPART_EXPR<divmod_tmp>. */ >> I'd just emit a copy from RES to the appropriate lhs operand just after the >> divmod and delete the now unnecessary TRUNC_DIV_EXPR and TRUNC_MOD_EXPR >> statements. > > That sounds like a good idea. Um sorry, not sure if I understood this part. For eg: t1 = x / y; t2 = x % y;
do we want to transform it to: complex_tmp = DIVMOD (x, y); div_tmp = REALPART_EXPR<complex_tmp> mod_tmp = IMAGPART_EXPR<complex_tmp> and then replace each use of lhs (t1, t2 etc.) by either div_tmp or mod_tmp (depending if the stmt was trunc_div_expr or trunc_mod_expr) ? Thanks, Prathamesh > >> Copy propagation should clean that up just fine and it simplifies your >> implementation. > > There should be no copy to propagate that way. > >> > + >> > + bool cfg_changed = false; >> > + for (unsigned i = 0; stmts.iterate (i, &use_stmt); ++i) >> > + { >> > + tree new_rhs; >> > + >> > + switch (gimple_assign_rhs_code (use_stmt)) >> > + { >> > + case TRUNC_DIV_EXPR: >> > + new_rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res); >> > + break; >> > + >> > + case TRUNC_MOD_EXPR: >> > + new_rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op2), res); >> > + break; >> > + >> > + default: >> > + gcc_unreachable (); >> > + } >> > + >> > + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); >> > + gimple_assign_set_rhs_from_tree (&gsi, new_rhs); >> > + update_stmt (use_stmt); >> > + >> > + if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt)) >> > + cfg_changed = true; >> Does this ever happen given how you filter out throwing statements earlier? > > We filter out throwing stmts that are not "top". The top one we replace > with the divmod call and we can preserve EH for that (and thus handle > the case where the original div/mod at this place may throw). > >> >> > @@ -3870,6 +4098,10 @@ pass_optimize_widening_mul::execute (function *fun) >> > match_uaddsub_overflow (&gsi, stmt, code); >> > break; >> > >> > + case TRUNC_MOD_EXPR: >> > + cfg_changed = convert_to_divmod (as_a<gassign *> (stmt)); >> > + break; >> > + >> > default:; >> > } >> > } >> If I'm reading this correctly, ISTM that you only search for a divmod pair if >> you first find the mod insn. That's probably reasonable. We have to find >> both to have an optimization opportunity and I suspect we have more div than >> mod insns, so you're presumably minimizing the search space with this >> approach. Can you confirm that was your intention here, or were you looking >> to do something else? > > Yes, that was the idea. > > Richard.