ni...@lysator.liu.se (Niels Möller) writes: I think I have (re-)discovered a different schoolbook division algorithm. I'll describe the case that we only want the remainder. I'm fairly confident one can assemble the quotient as well, with a bit of extra book-keeping, but I'm less sure that it can be faster than a schoolbook division loop around 3/2 division. I agree this method could make a faster SB mpn_tdiv_r, but I am unconvinced that you can get the quotient with fewer operatons than the current code.
Perhaps one could base a SB mpn_tdiv_qr using these ideas on parallel computation of the new dn-limb remainder and the corresponding quotient limb. This would mainly move the quotient limb computation off of the critical path. A problem that might then show up is that final adjusting of the quotient limb might be tricky. We should also be aware of that SB mpn_tdiv_qr could be rewritten with speculative quotient limb computation, in effect (but not completely) removing it from the critical path. The idea is to compute the top partial remainder limbs first before invoking mpn_submul_1, and stick the *next* qlimb computation also before calling mpn_submul_1. You should implement a SB mpn_tdiv_r based on your ideas. It ought to outperform the current code most when the divisor is small and the dividend is fairly large. (Almost) unrelated to performance is the side-chanell leakage properties. The new code could (as you mention) probably be easy to make side-chanell silent. -- Torbjörn _______________________________________________ gmp-devel mailing list gmp-devel@gmplib.org http://gmplib.org/mailman/listinfo/gmp-devel