ni...@lysator.liu.se (Niels Möller) writes: > One could attempt to omit q_1 and instead compute a more accurate q_0 as > > q_0 = LO(u_1 * v) + u_0 + HI(u_0 * v) > > and remainder would then hopefully be > > r = HI(f * d) > > That's one more multiplication, if I'm not missing something,
Had a closer look, and this won't work. To get close to working, one needs a more accurate recipropcal, say {v_1, v_0} = floor [(B^3 - 1)/d] - B^2 and add one more term to the fraction: q_0 = LO(u_1 * v_1) + u_0 + HI(u_0 * v_1) + HI(u_1 * v_0) So that's yet another multiplication, and I think it's still not accurate enough to get a correct remainder. HI(q_0 * d) will now be close, but I think it can still be one off. If one adds in the carry from the next lower fraction limb, LO(u0 * v1) + LO(u1 * v0) + HI(u0 * v0) it will probably work, but that's yet another multiply and a couple of additions. *Maybe* one can get away with only the LO parts. So with a two-limb reciprocal, the multiplication is +---+---+---+---+ |u_1|u_0|u_0*v_0| +---+---+---+---+ |u_1*v_1| +---+---+---+ |u_0*v_1| +-------+ |u_1*v_0| +---+---+---+---+ |q_1|q_0|q_{low}| +---+---+-------+ The "direct" method only needs q_0, but it needs it to be computed accurately, with contributions from all four products (*maybe* one could omit u_0*v0). While udiv_rnnd_preinv needs q_1 and a crude version of q_0, and gets by with only the top corner, omitting all products involving u_0 or v_0. So I doubt the direct method can be faster for 2/1 modulo. Regards, /Niels -- Niels Möller. PGP-encrypted email is preferred. Keyid 368C6677. Internet email is subject to wholesale government surveillance. _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org https://gmplib.org/mailman/listinfo/gmp-devel