Now we have a nice set of x86_64 gcd_22. The code is not as well tuned as the gcd_11 code, but it runs somewhat fast.
I haven't explored the table based variant which gives 3 bits of progress per iteration. It might make the new code obsolete for machines with fast multiply. Now what? Should we have gcd_33, gcd_44, and gcd_55 also? :-) (It is clear that these could improve speed greatly, and with the gcd_22 code they would not be hard to write these. Well, gcd_44 and above would not be able to keep things in the 15 usable registers of x86_64.) -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel