On Wed, Apr 27, 2016 at 04:15:53PM -0500, Evandro Menezes wrote:
> gcc/
> * config/aarch64/aarch64-protos.h
> (tune_params): Add new member "approx_div_modes".
> (aarch64_emit_approx_div): Declare new function.
> * config/aarch64/aarch64.c
> (generic_tunings): New member "approx_div_modes".
> (cortexa35_tunings): Likewise.
> (cortexa53_tunings): Likewise.
> (cortexa57_tunings): Likewise.
> (cortexa72_tunings): Likewise.
> (exynosm1_tunings): Likewise.
> (thunderx_tunings): Likewise.
> (xgene1_tunings): Likewise.
> (aarch64_emit_approx_div): Define new function.
> * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
> * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
> * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
> * doc/invoke.texi (-mlow-precision-div): Describe new option.
My comments from the other two patches around using a structure to
group up the tuning flags and whether we really want the new option
apply here too.
This code has no consumers by default and is only used for
-mlow-precision-div. Is this option likely to be useful to our users in
practice? It might all be more palatable under something like the rs6000's
-mrecip=opt .
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aarch64-simd.md
> index 47ccb18..7e99e16 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1509,7 +1509,19 @@
> [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
> )
>
> -(define_insn "div<mode>3"
> +(define_expand "div<mode>3"
> + [(set (match_operand:VDQF 0 "register_operand")
> + (div:VDQF (match_operand:VDQF 1 "general_operand")
What does this relaxation to general_operand give you?
> + (match_operand:VDQF 2 "register_operand")))]
> + "TARGET_SIMD"
> +{
> + if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
> + DONE;
> +
> + operands[1] = force_reg (<MODE>mode, operands[1]);
...other than the need to do this (sorry if I've missed something obvious).
> +})
> +
> +(define_insn "*div<mode>3"
> [(set (match_operand:VDQF 0 "register_operand" "=w")
> (div:VDQF (match_operand:VDQF 1 "register_operand" "w")
> (match_operand:VDQF 2 "register_operand" "w")))]
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 589871b..d3e73bf 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7604,6 +7612,83 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
> return true;
> }
>
> +/* Emit the instruction sequence to compute the approximation for a
> division. */
Long line, missing details on what the return type means and the meaning of
arguments.
> +
> +bool
> +aarch64_emit_approx_div (rtx quo, rtx num, rtx div)
DIV is ambiguous (divisor, or the RTX or the division itself?) "DIVISOR" is
not much more typing and is clear.
> +{
> + machine_mode mode = GET_MODE (quo);
> +
> + if (!flag_finite_math_only
> + || flag_trapping_math
> + || !flag_unsafe_math_optimizations
> + || optimize_function_for_size_p (cfun)
> + || !(flag_mlow_precision_div
> + || (aarch64_tune_params.approx_div_modes & AARCH64_APPROX_MODE
> (mode))))
Long line.
> + return false;
> +
> + /* Estimate the approximate reciprocal. */
> + rtx xrcp = gen_reg_rtx (mode);
> + switch (mode)
> + {
> + case SFmode:
> + emit_insn (gen_aarch64_frecpesf (xrcp, div)); break;
> + case V2SFmode:
> + emit_insn (gen_aarch64_frecpev2sf (xrcp, div)); break;
> + case V4SFmode:
> + emit_insn (gen_aarch64_frecpev4sf (xrcp, div)); break;
> + case DFmode:
> + emit_insn (gen_aarch64_frecpedf (xrcp, div)); break;
> + case V2DFmode:
> + emit_insn (gen_aarch64_frecpev2df (xrcp, div)); break;
> + default:
> + gcc_unreachable ();
> + }
Factor this to get_recpe_type or similar (as was done for get_rsqrts_type).
> +
> + /* Iterate over the series twice for SF and thrice for DF. */
> + int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
> +
> + /* Optionally iterate over the series once less for faster performance,
> + while sacrificing the accuracy. */
> + if (flag_mlow_precision_div)
> + iterations--;
> +
> + /* Iterate over the series to calculate the approximate reciprocal. */
> + rtx xtmp = gen_reg_rtx (mode);
> + while (iterations--)
> + {
> + switch (mode)
> + {
> + case SFmode:
> + emit_insn (gen_aarch64_frecpssf (xtmp, xrcp, div)); break;
> + case V2SFmode:
> + emit_insn (gen_aarch64_frecpsv2sf (xtmp, xrcp, div)); break;
> + case V4SFmode:
> + emit_insn (gen_aarch64_frecpsv4sf (xtmp, xrcp, div)); break;
> + case DFmode:
> + emit_insn (gen_aarch64_frecpsdf (xtmp, xrcp, div)); break;
> + case V2DFmode:
> + emit_insn (gen_aarch64_frecpsv2df (xtmp, xrcp, div)); break;
> + default:
> + gcc_unreachable ();
> + }
> +
> + if (iterations > 0)
> + emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
> + }
> +
> + if (num != CONST1_RTX (mode))
> + {
> + /* Calculate the approximate division. */
> + rtx xnum = force_reg (mode, num);
> + emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> + }
> +
> + /* Return the approximation. */
> + emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp));
> + return true;
> +}
> +
> /* Return the number of instructions that can be issued per cycle. */
> static int
> aarch64_sched_issue_rate (void)
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index aab3e00..a248f06 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4665,11 +4665,22 @@
> [(set_attr "type" "fmul<s>")]
> )
>
> -(define_insn "div<mode>3"
> +(define_expand "div<mode>3"
> + [(set (match_operand:GPF 0 "register_operand")
> + (div:GPF (match_operand:GPF 1 "general_operand")
> + (match_operand:GPF 2 "register_operand")))]
> + "TARGET_SIMD"
> +{
> + if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
> + DONE;
> +
> + operands[1] = force_reg (<MODE>mode, operands[1]);
> +})
> +
Same comment as above regarding general_operand.
> +(define_insn "*div<mode>3"
> [(set (match_operand:GPF 0 "register_operand" "=w")
> - (div:GPF
> - (match_operand:GPF 1 "register_operand" "w")
> - (match_operand:GPF 2 "register_operand" "w")))]
> + (div:GPF (match_operand:GPF 1 "register_operand" "w")
> + (match_operand:GPF 2 "register_operand" "w")))]
> "TARGET_FLOAT"
> "fdiv\\t%<s>0, %<s>1, %<s>2"
> [(set_attr "type" "fdiv<s>")]
Thanks,
James