Torbjorn Granlund <t...@gmplib.org> writes: > I agree this method could make a faster SB mpn_tdiv_r, but I am > unconvinced that you can get the quotient with fewer operatons than the > current code.
The method specialized for div_qr_1 did beat the udiv_qrnnd_preinv loop (at least when comparing the x86_64 code). And the needed quotient book-keeping should be very similar for larger divisors, at least for the variant which use a limb + bit multiplier v. I suspect the main drawbacks compared to current SB div_qr is higher probability of the bignum adjustment, an O(n^2) cost, and more expensive precomputation, O(n). None of these were relevant for div_qr_1. And the main advantage is that the depenency from one addmul_1 to the next involves no multiplication, an O(n) gain. Marco suggested (in private mail) that one can predict most of the carries early, and get the adjustment probability small, but I haven't looked into that. I should perhaps also mention the special case which got me onto this track: Moduli where the top limb already is B-1. Then there's no precomputation, and the adjustment probability is very low (since B^{n} - M is at most n - 1 limbs). Such moduli are used in several standard groups specified for diffie-hellman and ecc use. My conclusion for now is that for such groups, there's no point in using montgomery multiplication, plain mod using this method should be fast enough. > You should implement a SB mpn_tdiv_r based on your ideas. It ought to > outperform the current code most when the divisor is small and the > dividend is fairly large. I'd like to, but it's unlikely to happen very soon. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid C0B98E26. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel