On Wed, 4 Nov 2015, Prathamesh Kulkarni wrote: > On 2 November 2015 at 18:31, Richard Biener <rguent...@suse.de> wrote: > > On Mon, 2 Nov 2015, Prathamesh Kulkarni wrote: > > > >> On 2 November 2015 at 13:20, Prathamesh Kulkarni > >> <prathamesh.kulka...@linaro.org> wrote: > >> > On 30 October 2015 at 15:57, Richard Biener <richard.guent...@gmail.com> > >> > wrote: > >> >> On Fri, Oct 30, 2015 at 8:39 AM, Prathamesh Kulkarni > >> >> <prathamesh.kulka...@linaro.org> wrote: > >> >>> Hi, > >> >>> I have attached revamped version of Kugan's patch > >> >>> (https://gcc.gnu.org/ml/gcc/2013-06/msg00100.html) for PR43721. > >> >>> Test-case: http://pastebin.com/QMfpXLD9 > >> >>> divmod pass dump: http://pastebin.com/yMY1ikCp > >> >>> Assembly: http://pastebin.com/kk2HZpvA > >> >>> > >> >>> The approach I took is similar to sincos pass, which involves two > >> >>> parts: > >> >>> a) divmod pass that transforms: > >> >>> t1 = a / b; > >> >>> t2 = a % b; > >> >>> to the following sequence: > >> >>> complex_tmp = DIVMOD (a, b); > >> >>> t1 = REALPART_EXPR(a); > >> >>> t2 = IMAGPART_EXPR(b); > >> >>> > >> >>> b) DIVMOD(a, b) is represented as an internal-fn and is expanded by > >> >>> expand_DIVMOD(). > >> >>> I am not sure if I have done this correctly. I was somehow looking to > >> >>> reuse expand_divmod() but am not able to figure out how to do that > >> >>> (AFAIU expand_divmod() strictly returns either the quotient or > >> >>> remainder but never both which is what I want), so ended up with > >> >>> manually expanding to rtx. > >> >>> > >> >>> While going through the sincos pass in execute_cse_sincos_1, I didn't > >> >>> understand the following: > >> >>> if (gimple_purge_dead_eh_edges (gimple_bb (stmt))) > >> >>> cfg_changed = true; > >> >>> Um why is the call to gimple_purge_dead_eh_edges necessary here? > >> >> > >> >> The call replacement might no longer throw. > >> >> > >> >>> A silly question, what options to pass to gcc to print statistics ? I > >> >>> added statistics_counter_event to keep track of number of calls > >> >>> inserted/used but not sure how to print them :/ > >> >> > >> >> -fdump-tree-XXX-stats or -fdump-statistics-stats > >> > Thanks, I can see it now -;) > >> >> > >> >>> I would be grateful for suggestions for improving the patch. > >> >> > >> >> First of all new functions need comments (expand_DIVMOD). > >> >> > >> >> Second - what's the reasoning for the pass placement seen? I think > >> >> the transform is a lowering one, improving RTL expansion. The > >> >> DIVMOD representation is harmful for GIMPLE optimizers. > >> >> > >> >> Third - I'd have integrated this with an existing pass - we have another > >> >> lowering / RTL expansion kind pass which is pass_optimize_widening_mul. > >> >> Please piggy-back on it. > >> >> > >> >> You seem to do the transform unconditionally even if the target has > >> >> instructions for division and modulus but no instruction for divmod > >> >> in which case you'll end up emitting a library call in the end instead > >> >> of a div and mod instruction. I think the transform should be gated on > >> >> missing div/mod instructions or the availability of a divmod one. > >> >> > >> >> + if (is_gimple_assign (stmt) > >> >> + && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == > >> >> tcc_binary) > >> >> + { > >> >> + if (gimple_assign_rhs_code (stmt) == TRUNC_DIV_EXPR) > >> >> + cfg_changed |= execute_divmod_1 (stmt); > >> >> > >> >> you can directly check gimple_assign_rhs_code. I'd check for > >> >> TRUNC_MOD_EXPR > >> >> which seems to be less common and thus should cut down compile-time. > >> >> > >> >> + /* Case 2: When both are in top_bb */ > >> >> + else if (op1_def_in_bb && op2_def_in_bb) > >> >> + { > >> >> + /* FIXME: better way to compare iterators?. */ > >> >> + gimple_stmt_iterator gsi = get_later_stmt (top_bb, > >> >> def_stmt_op1, def_stmt_op2); > >> >> > >> >> You can't use get_later_stmt, it's a vectorizer speciality. To do > >> >> this test efficiently > >> >> you need stmt UIDs computed. > >> >> > >> >> I miss a testcase (or two). > >> > I have tried to address your comments in the attached patch. > >> oops, unsigned uid_no = 0; should be outside the loop in > >> pass_optimize_widening_mul::execute. > >> Done in this version of the patch. > >> > >> Thanks, > >> Prathamesh > >> > Could you please review it for me ? > > > > For the testcases you need a new target check for divmod. > Sorry I didn't understand. > Should I add sth similar to /* { dg-require-effective-target divmod } */ ?
Yes. > > > >> > I have a few questions: > >> > > >> > a) Is the check for availability of divmod correct ? > > > > You simply want optab_handler (tab, mode) != CODE_FOR_nothing > > > > The libfunc is always available. > Um I am probably missing something, I thought libfuncs are initialized > by init_optabs() > but the target can override/delete that with the target hook > targetm.init_libfuncs () ? > On AArch64 optab_libfunc (sdivmod_optab, SImode) returned NULL_RTX. Hum, I see. I thought we always have the libfunc if we don't have the instruction. > > > > When looking at TRUNC_DIV_EXPR you should also exclude > > the case where TYPE_OVERFLOW_TRAPS (type) as that should > > expand using the [su]divv optabs (no trapping overflow > > divmod optab exists). > > > >> > b) Assume TRUNC_DIV_EXPR stmt is in bb0 and TRUNC_DIV_MOD in bb1. > >> > I choose to transform if bb0 dominates bb1 or bb1 dominates bb0 (or bb0 > >> > == bb1). > >> > However I wonder if we should also check that if bb0 dominates bb1 > >> > then bb1 should > >> > post-dominate bb0 ? > > > > Depends on how expensive divmod is compared to div or mod. An alternative > > transform would be to look whether [us]mod optabs are available and if not > > implement modulo via div and subtract, CSEing the division itself. Not > > sure if any target that provides div patterns does not also provide > > mod ones. > > > >> > For instance the transform gets applied to test-case in pr43721-2.c. > >> > bb containing rem = x % y doesn't post-dominate bb containing quot = x / > >> > y; > >> > I wonder if we want to perform the transform for this case since control > >> > flow > >> > may not always reach from quot = x / y to rem = x % y ? > > > > See above, it's a cost question. We do for sincos as computing sin or > > cos if the other is already computed is cheap. I would suggest the > > same is true for div/mod. > > > >> > c) A silly queston, I wonder if vec<gimple *> stmts in > >> > convert_to_divmod() will > >> > have at most 2 entries (assuming TRUNC_MOD_EXPR stmt is also added in > >> > the vector) ? > >> > I assume that cse will take place before widening_mul pass executes > >> > (since pass_fre/pass_fre executes earlier), and there will be no > >> > duplicate gimple stmts ? > > > > There are passes inbetween widen-mult and the last CSE that (while > > not very likely) makes it possible (VRP jump threading). > > > >> > So vec<gimple *> stmts would contain at most 2 entries - gimple stmt > >> > with subcode TRUNC_MOD_EXPR > >> > and gimple stmt with subcode TRUNC_DIV_EXPR having same operands. > >> > There won't be another gimple stmt with subcode TRUNC_DIV_EXPR with > >> > same operands ? > > > > I wouldn't say so, see above. > Thanks for the explanation. > > > >> > d) I am not sure if I have correctly done expansion of divmod to rtx > >> > in expand_DIVMOD() (I fear I may be missing > >> > many checks that are in expmed.c:expand_divmod). > > > > I'm not sure either, esp. about the libcall path and you choosing > > a wider mode eventually (is this necessary?). Did you make sure > > this path works? > Removed choosing widening modes in this patch. > During the widening_mul pass it checks is divmod is available for the given > mode > and only then does the transform. It probably doesn't make sense to check > for a wider mode during expand_DIVMOD, since optab_libfunc should be > available for divmod with the given mode ? Yes. > > > > + /* Compute stmt uids. */ > > + unsigned uid_no = 0; > > + FOR_EACH_BB_FN (bb, fun) > > + { > > + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p > > (gsi); gsi_next (&gsi)) > > + { > > + gimple *stmt = gsi_stmt (gsi); > > + gimple_set_uid (stmt, uid_no++); > > + } > > + } > > > > renumber_gimple_stmt_uids () > Thanks. > > The attached patch has following additional changes: > a) Fixes ICE when one of the operands was constant (unconditionally > used SSA_NAME_DEF_STMT/ FOR_EACH_IMM_USE_FAST on constant tree > operands). > b) Adds use_stmt to vector_stmts only if the operands match and in the > same order. > I was incorrectly applying the transform to the following case: t1 = x > / y; t2 = y % x. +/* Expand DIVMOD() using: + a) optab handler for udivmod/sdivmod if it is available + b) If optab_handler doesn't exist, Generate call to optab_libfunc for udivmod/sdivmod. */ long line + expand_twoval_binop (tab, op0, op1, quotient, remainder, TYPE_UNSIGNED (type)); + + likewise + machine_mode libval_mode = smallest_mode_for_size (2 * GET_MODE_BITSIZE (mode), MODE_INT); likewise (please verify elsewhere yourself). Our column limit is 80. Testcases still need adjustment with an effective tagret. Btw, did you investigate code gen differences on x86_64/i586? That target expands all divisions/modulo ops via divmod, relying on CSE solely as the HW always computes both div and mod (IIRC). + if ((optab_handler (tab, mode) == CODE_FOR_nothing) && !optab_libfunc (tab, mode)) + return false; As originally said we shouldn't generate divmod library calls if the target has non-library div or mod implementations. Thus the proper check would be to && (!optab_libfunc (tab ...) || optab_handler (div) || optab_handler (mod)))) + /* FIXME: Is it safe to assume both operands cannot be constant + since constant folding would have taken place ? */ + gcc_assert (! (TREE_CONSTANT (op1) && TREE_CONSTANT (op2))); On trapping ops 1/0 might prevail for example. So I'd rather have a runtime check if op is an SSA_NAME I'm not sure I would even look at a / 3; a % 3 or 3 / a; 3 % a though (they tend to have special expansions). + FOR_EACH_IMM_USE_FAST (use_p, use_iter, op) use FOR_EACH_IMM_USE_STMT to only visit stmts once. + { + gimple *use_stmt = USE_STMT (use_p); + if (is_gimple_assign (use_stmt) + && gimple_assign_rhs_code (use_stmt) == TRUNC_DIV_EXPR + && !TYPE_OVERFLOW_TRAPS (TREE_TYPE (gimple_assign_lhs (use_stmt)))) Please check this upfront - you know the type from the stmt you are starting the visiting from. + { + tree u_op1 = gimple_assign_rhs1 (use_stmt); + tree u_op2 = gimple_assign_rhs2 (use_stmt); + + if ((operand_equal_p (op1, u_op1, 0)) && operand_equal_p (op2, u_op2, 0)) + maybe_record_divmod (stmts, top_bb, use_stmt); + /* If either operand is a constant, treat it's "definition" as + outside of top_bb. */ + + if (!TREE_CONSTANT (op1)) + { check TREE_CODE (op1) == SSA_NAME here and in related places. You seem to insert the call at the place of the definitions of the division/modulo operands rather than either at the point of the division or the modulo. Don't do that. Insert at the topmost division/modulo stmt. + for (unsigned i = 0; i < stmts.length (); ++i) + { + tree rhs; + use_stmt = stmts[i]; + + switch (gimple_assign_rhs_code (use_stmt)) + { + case TRUNC_DIV_EXPR: + rhs = fold_build1 (REALPART_EXPR, TREE_TYPE (op1), res); + break; + + case TRUNC_MOD_EXPR: + rhs = fold_build1 (IMAGPART_EXPR, TREE_TYPE (op1), res); + break; + + default: + gcc_unreachable (); + } + + gassign *assign_stmt = gimple_build_assign (gimple_assign_lhs (use_stmt), rhs); + gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt); Ick. Please use gimple_set_rhs_from_tree (use_stmt, res); update_stmt (use_stmt); if (maybe_clean_or_replace_eh_stmt (use_stmt, use_stmt)) cfg_changed = true; + free_dominance_info (CDI_DOMINATORS); do not free dominators. Richard. > Thanks, > Prathamesh > > > > Richard. > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)