"Marco Bodrato" <bodr...@mail.dm.unipi.it> writes: Before the changes I just pushed, I simply reordered the steps in the loop to shorten the first and the last iteration in the loop... Resulting in even better performance, I presume?
> How much speed difference is there now, for k = 4 vs sqrt(sqrt())? mpn_sqrt mpn_root.4 mpn_root.8 mpn_root.16 1 #33.86 659.37 229.62 178.01 2 #112.47 916.25 789.81 273.80 4 #245.35 1350.10 1111.97 1117.45 8 #419.01 1934.60 1683.93 1570.81 16 #1015.91 2611.44 2558.12 2472.39 32 #1666.53 3837.72 3969.28 4031.27 64 #3305.67 6295.23 6160.25 6654.23 > Is the difference small enough that we could fix it by running the first > few iterations using plain limb arithmetic? I fear it is not. That is indeed evident from that data. E.g., the delta between sizes 2 and 4 is more than twice greater for root than for sqrt. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel