On Wed, Apr 27, 2016 at 04:15:53PM -0500, Evandro Menezes wrote: > gcc/ > * config/aarch64/aarch64-protos.h > (tune_params): Add new member "approx_div_modes". > (aarch64_emit_approx_div): Declare new function. > * config/aarch64/aarch64.c > (generic_tunings): New member "approx_div_modes". > (cortexa35_tunings): Likewise. > (cortexa53_tunings): Likewise. > (cortexa57_tunings): Likewise. > (cortexa72_tunings): Likewise. > (exynosm1_tunings): Likewise. > (thunderx_tunings): Likewise. > (xgene1_tunings): Likewise. > (aarch64_emit_approx_div): Define new function. > * config/aarch64/aarch64.md ("div<mode>3"): New expansion. > * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise. > * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option. > * doc/invoke.texi (-mlow-precision-div): Describe new option.
My comments from the other two patches around using a structure to group up the tuning flags and whether we really want the new option apply here too. This code has no consumers by default and is only used for -mlow-precision-div. Is this option likely to be useful to our users in practice? It might all be more palatable under something like the rs6000's -mrecip=opt . > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index 47ccb18..7e99e16 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -1509,7 +1509,19 @@ > [(set_attr "type" "neon_fp_mul_<Vetype><q>")] > ) > > -(define_insn "div<mode>3" > +(define_expand "div<mode>3" > + [(set (match_operand:VDQF 0 "register_operand") > + (div:VDQF (match_operand:VDQF 1 "general_operand") What does this relaxation to general_operand give you? > + (match_operand:VDQF 2 "register_operand")))] > + "TARGET_SIMD" > +{ > + if (aarch64_emit_approx_div (operands[0], operands[1], operands[2])) > + DONE; > + > + operands[1] = force_reg (<MODE>mode, operands[1]); ...other than the need to do this (sorry if I've missed something obvious). > +}) > + > +(define_insn "*div<mode>3" > [(set (match_operand:VDQF 0 "register_operand" "=w") > (div:VDQF (match_operand:VDQF 1 "register_operand" "w") > (match_operand:VDQF 2 "register_operand" "w")))] > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 589871b..d3e73bf 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -7604,6 +7612,83 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp) > return true; > } > > +/* Emit the instruction sequence to compute the approximation for a > division. */ Long line, missing details on what the return type means and the meaning of arguments. > + > +bool > +aarch64_emit_approx_div (rtx quo, rtx num, rtx div) DIV is ambiguous (divisor, or the RTX or the division itself?) "DIVISOR" is not much more typing and is clear. > +{ > + machine_mode mode = GET_MODE (quo); > + > + if (!flag_finite_math_only > + || flag_trapping_math > + || !flag_unsafe_math_optimizations > + || optimize_function_for_size_p (cfun) > + || !(flag_mlow_precision_div > + || (aarch64_tune_params.approx_div_modes & AARCH64_APPROX_MODE > (mode)))) Long line. > + return false; > + > + /* Estimate the approximate reciprocal. */ > + rtx xrcp = gen_reg_rtx (mode); > + switch (mode) > + { > + case SFmode: > + emit_insn (gen_aarch64_frecpesf (xrcp, div)); break; > + case V2SFmode: > + emit_insn (gen_aarch64_frecpev2sf (xrcp, div)); break; > + case V4SFmode: > + emit_insn (gen_aarch64_frecpev4sf (xrcp, div)); break; > + case DFmode: > + emit_insn (gen_aarch64_frecpedf (xrcp, div)); break; > + case V2DFmode: > + emit_insn (gen_aarch64_frecpev2df (xrcp, div)); break; > + default: > + gcc_unreachable (); > + } Factor this to get_recpe_type or similar (as was done for get_rsqrts_type). > + > + /* Iterate over the series twice for SF and thrice for DF. */ > + int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2; > + > + /* Optionally iterate over the series once less for faster performance, > + while sacrificing the accuracy. */ > + if (flag_mlow_precision_div) > + iterations--; > + > + /* Iterate over the series to calculate the approximate reciprocal. */ > + rtx xtmp = gen_reg_rtx (mode); > + while (iterations--) > + { > + switch (mode) > + { > + case SFmode: > + emit_insn (gen_aarch64_frecpssf (xtmp, xrcp, div)); break; > + case V2SFmode: > + emit_insn (gen_aarch64_frecpsv2sf (xtmp, xrcp, div)); break; > + case V4SFmode: > + emit_insn (gen_aarch64_frecpsv4sf (xtmp, xrcp, div)); break; > + case DFmode: > + emit_insn (gen_aarch64_frecpsdf (xtmp, xrcp, div)); break; > + case V2DFmode: > + emit_insn (gen_aarch64_frecpsv2df (xtmp, xrcp, div)); break; > + default: > + gcc_unreachable (); > + } > + > + if (iterations > 0) > + emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp)); > + } > + > + if (num != CONST1_RTX (mode)) > + { > + /* Calculate the approximate division. */ > + rtx xnum = force_reg (mode, num); > + emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum)); > + } > + > + /* Return the approximation. */ > + emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp)); > + return true; > +} > + > /* Return the number of instructions that can be issued per cycle. */ > static int > aarch64_sched_issue_rate (void) > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index aab3e00..a248f06 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -4665,11 +4665,22 @@ > [(set_attr "type" "fmul<s>")] > ) > > -(define_insn "div<mode>3" > +(define_expand "div<mode>3" > + [(set (match_operand:GPF 0 "register_operand") > + (div:GPF (match_operand:GPF 1 "general_operand") > + (match_operand:GPF 2 "register_operand")))] > + "TARGET_SIMD" > +{ > + if (aarch64_emit_approx_div (operands[0], operands[1], operands[2])) > + DONE; > + > + operands[1] = force_reg (<MODE>mode, operands[1]); > +}) > + Same comment as above regarding general_operand. > +(define_insn "*div<mode>3" > [(set (match_operand:GPF 0 "register_operand" "=w") > - (div:GPF > - (match_operand:GPF 1 "register_operand" "w") > - (match_operand:GPF 2 "register_operand" "w")))] > + (div:GPF (match_operand:GPF 1 "register_operand" "w") > + (match_operand:GPF 2 "register_operand" "w")))] > "TARGET_FLOAT" > "fdiv\\t%<s>0, %<s>1, %<s>2" > [(set_attr "type" "fdiv<s>")] Thanks, James