ni...@lysator.liu.se (Niels Möller) writes: I was been thinking of the following algorithm (I think I wrote about the mod variant a while ago). Say we want to divide by an n-limb number D, and for simplicity, assume that D is normalised. First precompute the inverse v = floor(B^{n+1} / D) (one limb and a bit), and the corresponding n-limb remainder, r, so that I assume v is sometimes a limb and two bits, but only for one D. :-)
Do you mean to compute precisely the above value, not some very close limb+bit approximation (looking at, say the two most significant limbs of D)? B^{n+1} = v * D + R So r and R are the same? Now, if U is an n+2 limb number, we can reduce it as u_{n+1}^B{n+1} + U' = u_{n+1} v D + u_{n+1} R + U' Something is strange there. It seems that u_{n+1}*B^{n+1} + U' = u_{n+1} v D + u_{n+1} R + U' makes more sense. Here, the product u_{n+1} * v is two limbs + a bit to be added to the quotient, and U' + u_{n+1} r is n+1 limbs and a carry out. We can iterate this to reduce u from n+m limbs to n + 1 (we then need something different for the final quotient limb). U is n+2 limbs (per 6 lines above) and u is a different number of n+m limbs? OK, I think I see what you're doing. You're allowing some greater partial remainders, and will then (typically) generate a limb+bit quotient. This is highly interesting. For side-channel silent division, we'd need to store away the q limbs and add them up in a single loop after the main loop. The carry from addmul_1 would have to be handled inside the loop, not sure how to best do that. That is similar to what I do in my proposed half-limb-at-a-time code. For general schoolbook division, it's nice that we use addmul_1 rather than submul_1, and when addmul_2 is available it might make sense to extend to two quotient limbs at a time. Absolutely. We should avoid submul_1 if we can. Ah, and one more thing: For the applications which are going to need side-channel silent division, I think the common case is a mod operation with invariant divisor. In this case, I think the above idea (but without all the quotient book-keeping) is a pretty good alternative to schoolbook. I miss your point here. Are you thinking of replacing REDC with the above trick in, say, powm? I am mainly aiming at the a * B^n mod m operation for converting to Montgomery residues, and of CRT. -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel