Re: gcd_22

Torbjörn Granlund Fri, 23 Aug 2019 02:14:15 -0700

ni...@lysator.liu.se (Niels Möller) writes:

  The below implementation appears to pass tests, and give a modest
  speedup of 0.2 cycles per input bit, or 2.5%. Benchmark, comparing C
  implementations of gcd_11 and gcd_22:


Beware of "turbo" when counting cycles!  (Relative measurements like
gcd_11 vs gcd_22 should be fine!)

The speed difference between C gcd_11 and gcd_22 is surprisingly small!
Perhaps gcd_11 should be rewritten in the style of gcd_22?


I did not provide a top-level gcd_22 for x86_64 as you might have seen.
The one similar to x86_64/gcd_11.asm is probably x86_64/k8/gcd_22.asm.
Perhaps it should be moved.

But as far as I can tell, that function is slower than you C gcd_22 for
some platforms, such as Intel haswell.

I'm curious if your C code could be made into competitive asm.  One
usually can beat the compiler some 10-30%.

Measurements for gcd_11/22 for most of our machines are in.  See
https://gmplib.org/devel/tm/gmp/date.html and click on any HOSTgentoo64
tuneup link.  Scroll down; after the normal *_THRESHOLD stuff comes
comparative measurements of asm code.  (The mpn/generic code is not
usually measured; the exception is when it appears in the default
column.  I plan to fix this some day, and have a few columns "gcc -O",
"gcc -Os", "gcc -O2".)

-- 
Torbjörn
Please encrypt, key id 0xC8601622
_______________________________________________
gmp-devel mailing list
gmp-devel@gmplib.org
https://gmplib.org/mailman/listinfo/gmp-devel

Re: gcd_22

Reply via email to