Re: [PATCH 3/3][AArch64] Emit division using the Newton series

James Greenhalgh Wed, 25 May 2016 09:17:16 -0700

On Wed, Apr 27, 2016 at 04:15:53PM -0500, Evandro Menezes wrote:
>    gcc/
>         * config/aarch64/aarch64-protos.h
>         (tune_params): Add new member "approx_div_modes".
>         (aarch64_emit_approx_div): Declare new function.
>         * config/aarch64/aarch64.c
>         (generic_tunings): New member "approx_div_modes".
>         (cortexa35_tunings): Likewise.
>         (cortexa53_tunings): Likewise.
>         (cortexa57_tunings): Likewise.
>         (cortexa72_tunings): Likewise.
>         (exynosm1_tunings): Likewise.
>         (thunderx_tunings): Likewise.
>         (xgene1_tunings): Likewise.
>         (aarch64_emit_approx_div): Define new function.
>         * config/aarch64/aarch64.md ("div<mode>3"): New expansion.
>         * config/aarch64/aarch64-simd.md ("div<mode>3"): Likewise.
>         * config/aarch64/aarch64.opt (-mlow-precision-div): Add new option.
>         * doc/invoke.texi (-mlow-precision-div): Describe new option.


My comments from the other two patches around using a structure to
group up the tuning flags and whether we really want the new option
apply here too.

This code has no consumers by default and is only used for
-mlow-precision-div. Is this option likely to be useful to our users in
practice? It might all be more palatable under something like the rs6000's
-mrecip=opt .

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 47ccb18..7e99e16 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -1509,7 +1509,19 @@
>    [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
>  )
>  
> -(define_insn "div<mode>3"
> +(define_expand "div<mode>3"
> + [(set (match_operand:VDQF 0 "register_operand")
> +       (div:VDQF (match_operand:VDQF 1 "general_operand")

What does this relaxation to general_operand give you?

> +              (match_operand:VDQF 2 "register_operand")))]
> + "TARGET_SIMD"
> +{
> +  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
> +    DONE;
> +
> +  operands[1] = force_reg (<MODE>mode, operands[1]);

...other than the need to do this (sorry if I've missed something obvious).

> +})
> +
> +(define_insn "*div<mode>3"
>   [(set (match_operand:VDQF 0 "register_operand" "=w")
>         (div:VDQF (match_operand:VDQF 1 "register_operand" "w")
>                (match_operand:VDQF 2 "register_operand" "w")))]
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 589871b..d3e73bf 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7604,6 +7612,83 @@ aarch64_emit_approx_sqrt (rtx dst, rtx src, bool recp)
>    return true;
>  }
>  
> +/* Emit the instruction sequence to compute the approximation for a 
> division.  */

Long line, missing details on what the return type means and the meaning of
arguments.

> +
> +bool
> +aarch64_emit_approx_div (rtx quo, rtx num, rtx div)

DIV is ambiguous (divisor, or the RTX or the division itself?) "DIVISOR" is
not much more typing and is clear.

> +{
> +  machine_mode mode = GET_MODE (quo);
> +
> +  if (!flag_finite_math_only
> +      || flag_trapping_math
> +      || !flag_unsafe_math_optimizations
> +      || optimize_function_for_size_p (cfun)
> +      || !(flag_mlow_precision_div
> +        || (aarch64_tune_params.approx_div_modes & AARCH64_APPROX_MODE 
> (mode))))

Long line.

> +    return false;
> +
> +  /* Estimate the approximate reciprocal.  */
> +  rtx xrcp = gen_reg_rtx (mode);
> +  switch (mode)
> +    {
> +      case SFmode:
> +     emit_insn (gen_aarch64_frecpesf (xrcp, div)); break;
> +      case V2SFmode:
> +     emit_insn (gen_aarch64_frecpev2sf (xrcp, div)); break;
> +      case V4SFmode:
> +     emit_insn (gen_aarch64_frecpev4sf (xrcp, div)); break;
> +      case DFmode:
> +     emit_insn (gen_aarch64_frecpedf (xrcp, div)); break;
> +      case V2DFmode:
> +     emit_insn (gen_aarch64_frecpev2df (xrcp, div)); break;
> +      default:
> +     gcc_unreachable ();
> +    }

Factor this to get_recpe_type or similar (as was done for get_rsqrts_type).

> +
> +  /* Iterate over the series twice for SF and thrice for DF.  */
> +  int iterations = (GET_MODE_INNER (mode) == DFmode) ? 3 : 2;
> +
> +  /* Optionally iterate over the series once less for faster performance,
> +     while sacrificing the accuracy.  */
> +  if (flag_mlow_precision_div)
> +    iterations--;
> +
> +  /* Iterate over the series to calculate the approximate reciprocal.  */
> +  rtx xtmp = gen_reg_rtx (mode);
> +  while (iterations--)
> +    {
> +      switch (mode)
> +        {
> +       case SFmode:
> +         emit_insn (gen_aarch64_frecpssf (xtmp, xrcp, div)); break;
> +       case V2SFmode:
> +         emit_insn (gen_aarch64_frecpsv2sf (xtmp, xrcp, div)); break;
> +       case V4SFmode:
> +         emit_insn (gen_aarch64_frecpsv4sf (xtmp, xrcp, div)); break;
> +       case DFmode:
> +         emit_insn (gen_aarch64_frecpsdf (xtmp, xrcp, div)); break;
> +       case V2DFmode:
> +         emit_insn (gen_aarch64_frecpsv2df (xtmp, xrcp, div)); break;
> +       default:
> +         gcc_unreachable ();
> +        }
> +
> +      if (iterations > 0)
> +     emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xtmp));
> +    }
> +
> +  if (num != CONST1_RTX (mode))
> +    {
> +      /* Calculate the approximate division.  */
> +      rtx xnum = force_reg (mode, num);
> +      emit_set_insn (xrcp, gen_rtx_MULT (mode, xrcp, xnum));
> +    }
> +
> +  /* Return the approximation.  */
> +  emit_set_insn (quo, gen_rtx_MULT (mode, xrcp, xtmp));
> +  return true;
> +}
> +
>  /* Return the number of instructions that can be issued per cycle.  */
>  static int
>  aarch64_sched_issue_rate (void)
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index aab3e00..a248f06 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4665,11 +4665,22 @@
>    [(set_attr "type" "fmul<s>")]
>  )
>  
> -(define_insn "div<mode>3"
> +(define_expand "div<mode>3"
> + [(set (match_operand:GPF 0 "register_operand")
> +       (div:GPF (match_operand:GPF 1 "general_operand")
> +             (match_operand:GPF 2 "register_operand")))]
> + "TARGET_SIMD"
> +{
> +  if (aarch64_emit_approx_div (operands[0], operands[1], operands[2]))
> +    DONE;
> +
> +  operands[1] = force_reg (<MODE>mode, operands[1]);
> +})
> +

Same comment as above regarding general_operand.

> +(define_insn "*div<mode>3"
>    [(set (match_operand:GPF 0 "register_operand" "=w")
> -        (div:GPF
> -         (match_operand:GPF 1 "register_operand" "w")
> -         (match_operand:GPF 2 "register_operand" "w")))]
> +        (div:GPF (match_operand:GPF 1 "register_operand" "w")
> +              (match_operand:GPF 2 "register_operand" "w")))]
>    "TARGET_FLOAT"
>    "fdiv\\t%<s>0, %<s>1, %<s>2"
>    [(set_attr "type" "fdiv<s>")]

Thanks,
James

Re: [PATCH 3/3][AArch64] Emit division using the Newton series

Reply via email to