t...@gmplib.org (Torbjörn Granlund) writes:

> ni...@lysator.liu.se (Niels Möller) writes:
>
>   Maybe easier to wait until asm files are updated so that
>   HAVE_NATIVE_mpn_gcd_1 implies HAVE_NATIVE_mpn_gcd_11. Or was it some
>   particular call site you had in mind?
>
> I have seen calls to gcd_1 which could use gcd_11, presumably from
> mpn_gcd (or its descendant),

Right, mpn_gcd usually ends with a call to mpn_gcd_1 or gcd_2, and the
latter also usually ends with a call to gcd_1. But I think it's easiest
to leave that as is until we have good gcd_22.

> I expect asm gcd_1 to disappear as the C code should be equivalent.  Do
> you agree?

That would be nice. There will be one more function call, but hopefully
that's not going to be a significant performance regression.

> I suppose some hardwired stuff for the case u >> v (not bitshift, the
> mathematical meaning of >>!) might want to be parameterised and also
> ideally tune/tuneup'ed.

Should that be done by gcd_1 only? Or do we need some variant of gcd_11
with an initial division?

> (16 seems like a huge default value, btw.)

My workstation (intel broadwell) uses takes 96 cycles for a division (if
I read https://gmplib.org/~tege/x86-timing.pdf, or is there are faster
64/64 div isntruction?). And gcd_11 runs at roughly 4 cycles per input
bit according to speed. So then threshold should be around 24.

Below patch to add a gcd_11 entrypoint for this arch. Passes make check,
but would be good to also test with devel/try.

Regards,
/Niels

diff -Nprc2 gmp-gcd_11.cdf9e11a028b/mpn/asm-defs.m4 gmp-gcd_11/mpn/asm-defs.m4
*** gmp-gcd_11.cdf9e11a028b/mpn/asm-defs.m4     2019-08-06 17:16:07.000000000 
+0200
--- gmp-gcd_11/mpn/asm-defs.m4  2019-08-06 19:25:58.225354000 +0200
*************** define_mpn(dump)
*** 1395,1398 ****
--- 1395,1399 ----
  define_mpn(gcd)
  define_mpn(gcd_1)
+ define_mpn(gcd_11)
  define_mpn(gcdext)
  define_mpn(get_str)
diff -Nprc2 gmp-gcd_11.cdf9e11a028b/mpn/x86_64/core2/gcd_1.asm 
gmp-gcd_11/mpn/x86_64/core2/gcd_1.asm
*** gmp-gcd_11.cdf9e11a028b/mpn/x86_64/core2/gcd_1.asm  2019-08-06 
17:16:07.000000000 +0200
--- gmp-gcd_11/mpn/x86_64/core2/gcd_1.asm       2019-08-06 19:25:58.225354000 
+0200
*************** L(top): cmovc   %r10, %rax              C if x-y < 0
*** 137,141 ****
        cmovc   %r9, v0                 C use x,y-x    0,3  0,3  2,8  1,7  1,7
  L(mid):       shr     R8(%rcx), %rax          C              1,7  1,6  2,8  
2,8  2,8
!       mov     v0, %r10                C              1    1    4    3    3
        sub     %rax, %r10              C              2    2    5    4    4
        bsf     %r10, %rcx              C              3    3    6    5    5
--- 137,141 ----
        cmovc   %r9, v0                 C use x,y-x    0,3  0,3  2,8  1,7  1,7
  L(mid):       shr     R8(%rcx), %rax          C              1,7  1,6  2,8  
2,8  2,8
! L(odd):       mov     v0, %r10                C              1    1    4    3 
   3
        sub     %rax, %r10              C              2    2    5    4    4
        bsf     %r10, %rcx              C              3    3    6    5    5
*************** L(end): pop     %rcx                    C common twos
*** 150,151 ****
--- 150,163 ----
        ret
  EPILOGUE()
+ 
+       TEXT
+       ALIGN(16)
+ PROLOGUE(mpn_gcd_11)
+       FUNC_ENTRY(2)
+       xor     %ecx, %ecx
+       push    %rcx
+       mov     %rdi, %rax
+       mov     %rsi, v0
+       jmp     L(odd)
+ EPILOGUE()
+       

-- 
Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677.
Internet email is subject to wholesale government surveillance.
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Reply via email to