ni...@lysator.liu.se (Niels Möller) writes: Let us review the differences between bdiv and redc, and focus on the balanced case (since that's what redc uses). Good idea!
We have an odd modulo M of size n limbs, and a numerator U of size 2n limbs. bdiv computes Q = U M^-1 (mod B^n) R = B^{-n} (U - Q M) where Q is n limbs, and R is n limbs plus a bit, it's in the range -M < R < B^n The redc functions in gmp used to generate an R of n limbs and a Q of n limbs and a bit, so we had quite a bit of impedance mismatch. But with the current code, it seems that the redc functions compute Q = - U M^{-1} (mod B^n) R = B^{-n} (U + Q M) where Q is n limbs (but not returned), and R is n limbs plus a bit, 0 <= R < B^n + M The redc functions return a carry, while the bdiv functions return a borrow. I must admit that I didn't have this completely clear in my mind. I think the reason for the redc changes was to let the caller decide if the final conditional subtraction should be done unsing a constant time method (powm_sec) or faster but with data-dependent timing (regular powm). Yes. (And in some situation, the subtraction is not needed at all. Paul Zimmermann told me that the GMP-ECM code does not need it. It might be possible to do without it also in powm.) But this means that if we were to implement a bdiv_r consistent with the other bdiv functions, one could replace cy = mpn_redc_* (rp, up, mp, n, ...) if (cy != 0) mpn_sub_n (rp, rp, mp, n) by cy = mpn_bdiv_r (rp, up, 2*n, mp, n, ...)); if (cy != 0) mpn_add_n (rp, rp, mp, n); What you're suggesting, is, I suppose: 1. Implement mpn_bdiv_r (at SB level, DC level, MU level) 2. Get rid of specific mpn_redc_* and use mpn_bdiv_r instead I think this is a good idea. We need to carefully watch for slowdown. We need to generalise mpn/x86_64/redc_1.asm to become a mpn_sbpi1_bdiv_r. (I have several other asm redc_1 and recd_2 which are not yet pushed.) Here is a slight disadvantage of getting rid of redc: mpn_sbpi1_bdiv_r will necessarily be more complex, probably using two outer loops than mpn_redc_1. This matters for asm implementations. That would be nice cleanup, I think. Now there are a couple of other differences between the bdiv and redc interfaces: 1. The bdiv interface places the remainder at the high end of the U area, while redc arranges to put it at the low end. Can or should we change that? I guess it's possible to use a different convention for a new bdiv_r than for bdiv_qr. I guess the currrent bdiv_qr convention is beneficial for the dc routines, but I haven't looked at that code recently. It would be trivial to fix at least mpn_sbpi1_bdiv_qr to pt the remainder in the low end. It probably makes some sense. Or perhaps we allow the quotient to be stored there? 2. Obviously, the bdiv functions return the quotient. It seems mpn_sbpi1_bdiv_qr uses a loop very similar to redc, with addmul_1 rather than submul_1. And then there's some logic to negate the quotient at the end; all that can be omitted for bdiv_r. Indeed. 3. There's a redc_2 function, but no pi2_bdiv functions. That ought to be fixed (irrespective of the suggested merge). 4. There's a dc_bdiv function, but no redc_dc. After some analysis, I think there's a place for redc_dc, even if we assume that the inverse computation for redc_n is free. This is indeed an omission. 5. There's a mu_bdiv function, using a smaller inverse than redc_n, which is the closest corresponding function on the redc side. I guess it makes sense to always use a full inverse for redc, given that it's used mainly when the modulo is invariant? There should be a mpn_preinv_mu_bdiv_qr (we have the analogous Euclidian division function). That would cover the current redc_n, but also allow for smaller pre-inverses for when the divisor/modulus invariance is smaller. -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel