ni...@lysator.liu.se (Niels Möller) writes: Updated version below, without input_v and corresponding divisions. Also one of the binary variants use binv. While debugging, I found that the accumulated shift may well exceed GMP_LIMB_BITS (shouldn't be so surprising), but I think it must be < 2 * GMP_LIMB_BITS. And there are a few corner cases.
I guess it's possible to do all cofactor shifts as part of the loop, by shifting only one bit per iteration, and making the subtraction conditional on both nubmers being odd. That would be like a single-limb variant of mpn_sec_invert (but with a stop condition to exit early). Unclear if that can be a winner, with less progress per iteration in the main loop. And it may produce a cofactor that's unnecessarily large in the g > 1 case. BTW, the binv/redc part uses an umulhi operation. Maybe we should add that to longlong.h? I believe umul_ppmm with unused low result is adequate. When there are separate instructions for low and high parts, we have made sure to implement umul_ppmm in such a way that the compiler will eliminate the low part. How far did you come with gcdext_1? Do you have any timing improvements to report? :-) -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel