On Sat, 18 Feb 2023 10:42:51 PST (-0800), pins...@gmail.com wrote:
On Sat, Feb 18, 2023 at 10:27 AM Palmer Dabbelt <pal...@dabbelt.com> wrote:
On Fri, 17 Feb 2023 06:02:40 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> Hi all,
> If we have division and remainder calculations with the same operands:
>
> a = b / c;
> d = b % c;
>
> We can replace the calculation of remainder with multiplication +
> subtraction, using the result from the previous division:
>
> a = b / c;
> d = a * c;
> d = b - d;
>
> Which will be faster.
Do you have any benchmarks that show that performance increase? The ISA
manual specifically says the suggested sequence is div+mod, and while
those suggestions don't always pan out for real hardware it's likely
that at least some implementations will end up with the ISA-suggested
fusions.
I suspect I will be needing this kind of patch for the core that I am
going to be using.
OK, good to know. Presumably you guys aren't ready to show benchmarks,
though?
If anything this should be under a tuning option.
That seems likely, as IIRC the SiFive cores do this fusion. It
generally seems like we're going to end up with implementations all over
the place when it comes to what's fused, so I bet we'll have a lot of
these differences between cores.
Thanks,
Andrew Pinski
> Currently, it isn't done for RISC-V.
>
> I've added an expander for DIVMOD which replaces 'rem' with 'mul + sub'.
>
> Best regards,
> Matevos.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.md: Added divmod expander.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/riscv/divmod.c: New testcase.
>
> --- inline copy of the patch ---
>
> diff --git a/gcc/config/riscv/iterators.md b/gcc/config/riscv/iterators.md
> index f95dd405e12..d941483d9f1 100644
> --- a/gcc/config/riscv/iterators.md
> +++ b/gcc/config/riscv/iterators.md
> @@ -148,6 +148,11 @@
> ;; from the same template.
> (define_code_iterator any_mod [mod umod])
>
> +;; These code iterators allow unsigned and signed divmod to be generated
> +;; from the same template.
> +(define_code_iterator only_div [div udiv])
> +(define_code_attr paired_mod [(div "mod") (udiv "umod")])
> +
> ;; These code iterators allow the signed and unsigned scc operations to use
> ;; the same template.
> (define_code_iterator any_gt [gt gtu])
> @@ -175,7 +180,8 @@
> (gt "") (gtu "u")
> (ge "") (geu "u")
> (lt "") (ltu "u")
> - (le "") (leu "u")])
> + (le "") (leu "u")
> + (div "") (udiv "u")])
>
> ;; <su> is like <u>, but the signed form expands to "s" rather than "".
> (define_code_attr su [(sign_extend "s") (zero_extend "u")])
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index c8adc5af5d2..2d48ff3f8de 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -1044,6 +1044,22 @@
> [(set_attr "type" "idiv")
> (set_attr "mode" "DI")])
>
> +(define_expand "<u>divmod<mode>4"
> + [(parallel
> + [(set (match_operand:GPR 0 "register_operand")
> + (only_div:GPR (match_operand:GPR 1 "register_operand")
> + (match_operand:GPR 2 "register_operand")))
> + (set (match_operand:GPR 3 "register_operand")
> + (<paired_mod>:GPR (match_dup 1) (match_dup 2)))])]
> + "TARGET_DIV"
> + {
> + rtx tmp = gen_reg_rtx (<MODE>mode);
> + emit_insn (gen_<u>div<GPR:mode>3 (operands[0], operands[1],
> operands[2]));
> + emit_insn (gen_mul<GPR:mode>3 (tmp, operands[0], operands[2]));
> + emit_insn (gen_sub<GPR:mode>3 (operands[3], operands[1], tmp));
> + DONE;
> + })
> +
> (define_insn "*<optab>si3_extended"
> [(set (match_operand:DI 0 "register_operand" "=r")
> (sign_extend:DI
> diff --git a/gcc/testsuite/gcc.target/riscv/divmod.c
> b/gcc/testsuite/gcc.target/riscv/divmod.c
> new file mode 100644
> index 00000000000..254b25e654d
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/divmod.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-O1" "-Og" } } */
> +
> +void
> +foo(int a, int b, int *c, int *d)
> +{
> + *c = a / b;
> + *d = a % b;
> +}
> +
> +/* { dg-final { scan-assembler-not "rem" } } */
> +/* { dg-final { scan-assembler-times "mul" 1 } } */
> +/* { dg-final { scan-assembler-times "sub" 1 } } */