We now have the algorithm while (highlimb(U) | highlimb(V)) != 0 S = U - V T = V - U cnt = count_trailing_zeros(lowlimb(S)) cnd1 = U < V if (cnd1) V = U U = T else U = S U >>= cnt
with some variation. The "if" statement is performed with masking tricks or conditional move/conditional select. The main performance problem with that code is the dependency on lowlimb(U) in the conditional assignment, through the double-limb shift, to the S and T assignments. The looping criteria is a smaller problem, as branch prediction takes care of that. I think we might get more speed from this algorithm: while (highlimb(U) | highlimb(V)) != 0 S = U - V T = V - U cnt = count_trailing_zeros(lowlimb(S)) cnd1 = U < V cnd2 = highlimb(U) < highlimb(V) if (cnd2) V = U U = T else U = S if (cnd1 != cnd2) { fix mistake! } U >>= cnt; Why would it be faster? Because while cnd1 needs to wait for the previous iteration's double-limb shifting, cnd2 depends on just the single-limb shifting of the high U world. We might make the wrong decision iff highlimb(U) = highlimb(V) in which case cnd1 might be true (cnd2 is false of course). But when highlimb(U) = highlimb(V) then either highlimb(U-V) = 0 or highlimb(V-U) = 0, so we're about to go into a gcd_21 situation. (We might also have highlimb(U-V) = highlimb(V-U) = 0, in which case we're done completely.) With some inspiration of Richard Sites, Alpha architect, I say: It's the critical path, stupid! :-) -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel