t...@gmplib.org (Torbjörn Granlund) writes: > ni...@lysator.liu.se (Niels Möller) writes: > > Maybe easier to wait until asm files are updated so that > HAVE_NATIVE_mpn_gcd_1 implies HAVE_NATIVE_mpn_gcd_11. Or was it some > particular call site you had in mind? > > I have seen calls to gcd_1 which could use gcd_11, presumably from > mpn_gcd (or its descendant),
Right, mpn_gcd usually ends with a call to mpn_gcd_1 or gcd_2, and the latter also usually ends with a call to gcd_1. But I think it's easiest to leave that as is until we have good gcd_22. > I expect asm gcd_1 to disappear as the C code should be equivalent. Do > you agree? That would be nice. There will be one more function call, but hopefully that's not going to be a significant performance regression. > I suppose some hardwired stuff for the case u >> v (not bitshift, the > mathematical meaning of >>!) might want to be parameterised and also > ideally tune/tuneup'ed. Should that be done by gcd_1 only? Or do we need some variant of gcd_11 with an initial division? > (16 seems like a huge default value, btw.) My workstation (intel broadwell) uses takes 96 cycles for a division (if I read https://gmplib.org/~tege/x86-timing.pdf, or is there are faster 64/64 div isntruction?). And gcd_11 runs at roughly 4 cycles per input bit according to speed. So then threshold should be around 24. Below patch to add a gcd_11 entrypoint for this arch. Passes make check, but would be good to also test with devel/try. Regards, /Niels diff -Nprc2 gmp-gcd_11.cdf9e11a028b/mpn/asm-defs.m4 gmp-gcd_11/mpn/asm-defs.m4 *** gmp-gcd_11.cdf9e11a028b/mpn/asm-defs.m4 2019-08-06 17:16:07.000000000 +0200 --- gmp-gcd_11/mpn/asm-defs.m4 2019-08-06 19:25:58.225354000 +0200 *************** define_mpn(dump) *** 1395,1398 **** --- 1395,1399 ---- define_mpn(gcd) define_mpn(gcd_1) + define_mpn(gcd_11) define_mpn(gcdext) define_mpn(get_str) diff -Nprc2 gmp-gcd_11.cdf9e11a028b/mpn/x86_64/core2/gcd_1.asm gmp-gcd_11/mpn/x86_64/core2/gcd_1.asm *** gmp-gcd_11.cdf9e11a028b/mpn/x86_64/core2/gcd_1.asm 2019-08-06 17:16:07.000000000 +0200 --- gmp-gcd_11/mpn/x86_64/core2/gcd_1.asm 2019-08-06 19:25:58.225354000 +0200 *************** L(top): cmovc %r10, %rax C if x-y < 0 *** 137,141 **** cmovc %r9, v0 C use x,y-x 0,3 0,3 2,8 1,7 1,7 L(mid): shr R8(%rcx), %rax C 1,7 1,6 2,8 2,8 2,8 ! mov v0, %r10 C 1 1 4 3 3 sub %rax, %r10 C 2 2 5 4 4 bsf %r10, %rcx C 3 3 6 5 5 --- 137,141 ---- cmovc %r9, v0 C use x,y-x 0,3 0,3 2,8 1,7 1,7 L(mid): shr R8(%rcx), %rax C 1,7 1,6 2,8 2,8 2,8 ! L(odd): mov v0, %r10 C 1 1 4 3 3 sub %rax, %r10 C 2 2 5 4 4 bsf %r10, %rcx C 3 3 6 5 5 *************** L(end): pop %rcx C common twos *** 150,151 **** --- 150,163 ---- ret EPILOGUE() + + TEXT + ALIGN(16) + PROLOGUE(mpn_gcd_11) + FUNC_ENTRY(2) + xor %ecx, %ecx + push %rcx + mov %rdi, %rax + mov %rsi, v0 + jmp L(odd) + EPILOGUE() + -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel