Hi, I think I have (re-)discovered a different schoolbook division algorithm. I'll describe the case that we only want the remainder. I'm fairly confident one can assemble the quotient as well, with a bit of extra book-keeping, but I'm less sure that it can be faster than a schoolbook division loop around 3/2 division.
Say we have an n-limb number M. Then there exists a number v such that v M = B^{n+1} - M' where M' is at most n limbs. There are multiple alternatives, in particular if the top limb of M is not normalized. One choice is v = floor [B^{n+1} / M] This is useful if we rewrite it as B^{n+1} = M' (mod M) Now consider an U of n+2 limbs, U = u_{n+1} B^{n+1} + u_n B^n + ... + u0. Then we can reduce it by one limb using u_{n+1} B^{n+1} = u_{n+1} M' (mod M) where the righthand side is n+1 limbs. When adding to the other low u limbs, we may get a carry, and in this case we'd need to add in M' one more time (and then we can't get another carry, if I'm getting the bounds right). So the main loop would be something like hi = up[--un]; while (un > mn) { hi = mpn_addmul_1 (up + un - mn - 1, mp, mn, hi); ul = up[--un]; hi += ul; if (hi < ul) hi += mpn_add_n (up + un - mn, mp, mn); } Here "mp" should point to the value M' above. Not sure what the statistics for the carry is, but it's going to become less likely the more unnormalized the top limb of M is. Once U is reduced to n + 1 limbs, one needs to do two more or less plain divisions to get a final remainder in the range 0 <= R < M. The needed inverses are intimately related to the value v computed above. When can this be useful? Some thoughts: * If the carry is reasonably unlikely, then I think this may be able to beat plain schoolbook, since there are fewer and simpler adjustments. It's looks like a left-to-right redc, but with more complex carry handling. * It may be attractive for a "constant time" division. Then it doesn't really matter if the adjustment is unlikely, since we're going to need some adjustment step, and we have to implement it using mpn_addcnd_n. * If we want to try to get the quotient as well, then I think we should go for a v which is a single limb (and a fix one it). Let 2^s M be normalized, and set v = floor [(B^{n+1} - 1) / (2^s M)]. We then get the quotients by multiplying the sequence of top limbs in the algorithm by (B + v), either during the loop or an additonal mul_1 after the loop. * The multiplier v can be thought of as an (n+1)/n inverse. It can be computed by first computing a 3/2 inverse v, then forming (B+v) * M, and based on that one can both adjust v and compute M'. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel