On Sun, Jan 17, 2016 at 9:06 AM, Prathamesh Kulkarni <prathamesh.kulka...@linaro.org> wrote: > On 31 July 2015 at 15:04, Ramana Radhakrishnan > <ramana.radhakrish...@foss.arm.com> wrote: >> >> >> On 29/07/15 11:09, Prathamesh Kulkarni wrote: >>> Hi, >>> This patch tries to implement division with multiplication by >>> reciprocal using vrecpe/vrecps >>> with -funsafe-math-optimizations and -freciprocal-math enabled. >>> Tested on arm-none-linux-gnueabihf using qemu. >>> OK for trunk ? >>> >>> Thank you, >>> Prathamesh >>> >> >> I've tried this in the past and never been convinced that 2 iterations are >> enough to get to stability with this given that the results are only precise >> for 8 bits / iteration. Thus I've always believed you need 3 iterations >> rather than 2 at which point I've never been sure that it's worth it. So the >> testing that you've done with this currently is not enough for this to go >> into the tree. >> >> I'd like this to be tested on a couple of different AArch32 implementations >> with a wider range of inputs to verify that the results are acceptable as >> well as running something like SPEC2k(6) with atleast one iteration to >> ensure correctness. > Hi, > I got results of SPEC2k6 fp benchmarks: > a15: +0.64% overall, 481.wrf: +6.46% > a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76% > a57: +0.35% overall, 481.wrf: +3.84% > The other benchmarks had (almost) identical results.
Thanks for the benchmarking results - Please repost the patch with the changes that I had requested in my previous review - given it is now stage4 , I would rather queue changes like this for stage1 now. Thanks, Ramana > > Thanks, > Prathamesh >> >> >> moving on to the patches. >> >>> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md >>> index 654d9d5..28c2e2a 100644 >>> --- a/gcc/config/arm/neon.md >>> +++ b/gcc/config/arm/neon.md >>> @@ -548,6 +548,32 @@ >>> (const_string "neon_mul_<V_elem_ch><q>")))] >>> ) >>> >> >> Please add a comment here. >> >>> +(define_expand "div<mode>3" >>> + [(set (match_operand:VCVTF 0 "s_register_operand" "=w") >>> + (div:VCVTF (match_operand:VCVTF 1 "s_register_operand" "w") >>> + (match_operand:VCVTF 2 "s_register_operand" "w")))] >> >> I want to double check that this doesn't collide with Alan's patches for >> FP16 especially if he reuses the VCVTF iterator for all the vcvt f16 cases. >> >>> + "TARGET_NEON && flag_unsafe_math_optimizations && flag_reciprocal_math" >>> + { >>> + rtx rec = gen_reg_rtx (<MODE>mode); >>> + rtx vrecps_temp = gen_reg_rtx (<MODE>mode); >>> + >>> + /* Reciprocal estimate */ >>> + emit_insn (gen_neon_vrecpe<mode> (rec, operands[2])); >>> + >>> + /* Perform 2 iterations of Newton-Raphson method for better accuracy */ >>> + for (int i = 0; i < 2; i++) >>> + { >>> + emit_insn (gen_neon_vrecps<mode> (vrecps_temp, rec, operands[2])); >>> + emit_insn (gen_mul<mode>3 (rec, rec, vrecps_temp)); >>> + } >>> + >>> + /* We now have reciprocal in rec, perform operands[0] = operands[1] * >>> rec */ >>> + emit_insn (gen_mul<mode>3 (operands[0], operands[1], rec)); >>> + DONE; >>> + } >>> +) >>> + >>> + >>> (define_insn "mul<mode>3add<mode>_neon" >>> [(set (match_operand:VDQW 0 "s_register_operand" "=w") >>> (plus:VDQW (mult:VDQW (match_operand:VDQW 2 "s_register_operand" >>> "w") >>> diff --git a/gcc/testsuite/gcc.target/arm/vect-div-1.c >>> b/gcc/testsuite/gcc.target/arm/vect-div-1.c >>> new file mode 100644 >>> index 0000000..e562ef3 >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/arm/vect-div-1.c >>> @@ -0,0 +1,14 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-require-effective-target arm_v8_neon_ok } */ >>> +/* { dg-options "-O2 -funsafe-math-optimizations -ftree-vectorize >>> -fdump-tree-vect-all" } */ >>> +/* { dg-add-options arm_v8_neon } */ >> >> No this is wrong. >> >> What is armv8 specific about this test ? This is just like another test that >> is for Neon. vrecpe / vrecps are not instructions that were introduced in >> the v8 version of the architecture. They've existed in the base Neon >> instruction set. The code generation above in the patterns will be enabled >> when TARGET_NEON is true which can happen when -mfpu=neon >> -mfloat-abi={softfp/hard} is true. >> >>> + >>> +void >>> +foo (int len, float * __restrict p, float *__restrict x) >>> +{ >>> + len = len & ~31; >>> + for (int i = 0; i < len; i++) >>> + p[i] = p[i] / x[i]; >>> +} >>> + >>> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */ >>> diff --git a/gcc/testsuite/gcc.target/arm/vect-div-2.c >>> b/gcc/testsuite/gcc.target/arm/vect-div-2.c >>> new file mode 100644 >>> index 0000000..8e15d0a >>> --- /dev/null >>> +++ b/gcc/testsuite/gcc.target/arm/vect-div-2.c >>> @@ -0,0 +1,14 @@ >>> +/* { dg-do compile } */ >>> +/* { dg-require-effective-target arm_v8_neon_ok } */ >> >> And likewise. >> >>> +/* { dg-options "-O2 -funsafe-math-optimizations -fno-reciprocal-math >>> -ftree-vectorize -fdump-tree-vect-all" } */ >>> +/* { dg-add-options arm_v8_neon } */ >>> + >>> +void >>> +foo (int len, float * __restrict p, float *__restrict x) >>> +{ >>> + len = len & ~31; >>> + for (int i = 0; i < len; i++) >>> + p[i] = p[i] / x[i]; >>> +} >>> + >>> +/* { dg-final { scan-tree-dump-times "vectorized 0 loops" 1 "vect" } } */ >> >> >> regards >> Ramana