There was a bug in the "euclides 2-choice binary" function. Now, with the intended algorithm, we get almost 4 bits per iteration.
Algorithm Iterations Iter/Bit Bit/Iter Max iter plain binary : 378420598 0.697 1.435 58 binary+- : 329732327 0.607 1.647 55 binary+- alternating : 419867135 0.773 1.293 70 binary tab4 : 332015652 0.611 1.635 56 binary tab16 : 239589147 0.441 2.266 42 binary tab64 : 217666974 0.401 2.495 40 binary tab256 : 206886403 0.381 2.625 40 binary tab1024 : 197076246 0.363 2.755 35 binary tab4096 : 193746095 0.357 2.803 36 plain euclides : 313383405 0.577 1.733 61 euclides binary : 197579996 0.364 2.748 40 euclides 2-choice binary : 139135737 0.256 3.903 26 Here is the corrected version: static mp_limb_t gcd_euclidespm (mp_limb_t a, mp_limb_t b) { int cnt; mp_limb_t r, r2; r = a % b; while (r != 0) { a = b; r2 = b - r; count_trailing_zeros (cnt, r); r >>= cnt; count_trailing_zeros (cnt, r2); r2 >>= cnt; if (r > r2) b = r2; else b = r; r = a % b; } return b; } I tried this for gcd_11 on some arm64 CPUs. They get close to the current pure binary implementations, but they don't beat them. -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel