> count = (49 * bits + 57) / 17; > > Odd. For sure. This isn't based on local progress of the algorithm (there ain't no guaranteed progress for a short sequence of reduction steps), but based on rather complex analysis of the number of steps needed for the complete 2-adic quotient sequence.
That count is a proven "upper bound" of the numbver of iterations. Does the paper discuss how tight it is, i.e., is it close to a "lower bound" of the required number of iterations. > I think your measurements are a bit optimistic, though. My measruments > from slightly modified timing code suggest 4.5x slowdown, and more for > really small operands. Maybe, I tried to keep the timing really simple and portable. I simply up'ed the number of iterations (actually made them depend in the operand size). I've now implemented inverse too. See updated code below. When I run it, I get invert size 1, ref 0.293 (us), djb 1.145 (us) invert size 2, ref 0.721 (us), djb 1.659 (us) invert size 3, ref 0.994 (us), djb 2.375 (us) [...] invert size 17, ref 5.157 (us), djb 18.538 (us) invert size 18, ref 5.207 (us), djb 19.064 (us) invert size 19, ref 5.702 (us), djb 21.286 (us) so a slowdown of 3--4 compared to mpz_invert. This timing excludes the precomputation of a few needed per-modulo constants. Very very nice! People not famililiar with what we're trying to do here might be unimpressed with these results. "Look, I've written new code which is 4 times slower than what we have, HURRAY!" Comparing to the existing side-channel-silent inversion code would impress more, I think. I think the inverse flavor is still rather neat, main loop is for (delta = 1;count > STEPS;count -= STEPS) { delta = steps (&M, STEPS, delta, fp[0], gp[0]); matrix_vector_mul (&M, STEPS, n+1, fp, gp, tp); matrix_vector_mul_mod (&M, Bp, n+2, up, vp, tp); } I can make it even neater: for (delta = 1;count > STEPS;count -= STEPS) { do_it_all (fp, gp, up, vp, n); } :-) I'm thinking I'll try to implement and benchmark this for Nettle's ecc functions first, before attempting to update GMP function. The reason is that (i) I really want to do needed precomputations for all moduli of interest for Nettle at compile time, which would complicate the GMP interface if it is supported at all, and (ii) in some cases I want inversion to operate on numbers in montgomery representation, and then I'd like to fold any needed additional power of two into the precomputed constant. What percentage of Ec scalar mul is used for the final inversion? Not much, I suppose. Does Nettle use exponentiation there today, or mpn_sec_invert? -- Torbjörn Please encrypt, key id 0xC8601622 _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel