ni...@lysator.liu.se (Niels Möller) writes: I seriously doubt that it works for sbpi1_bdiv_qr. I think it does work for sbpi1_bdiv_q. That function doesn't use the trick of delaying the addition of the carry limbs, it just applies the carry limbs one at a time with mpn_add_1, like cy = mpn_addmul_1 (np, dp, dn, q); mpn_add_1 (np + dn, np + dn, i, cy); Which is probably costing many cycles.
I would like a structure like redc's, i.e. iterate nn-dn times q = ... hi = mpn_addmul_1 (...) blah I mean, a single outer loop for each quotient limb. This could be made to work if blah is an mpn_add_1, but that's quite expensive. It can probably not be made to work without applying mpn_addmul_1's carry-out to the partial remainder. I think the right way would be to apply it manually, and keep 2-limb carry, one from mpn_addmul_1 and one from the addition of the return value to the partial remainder... Something like this: iterate nn-dn times q = np[0] * dinv hi = mpn_addmul_1 (np, dp, dn, q) hi += cy cy = hi < cy // can this be true? hi += np[dn] cy2 += hi < np[dn] np[dn] = hi np += 1; -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel